diff options
This commit was manufactured by cvs2svn to create tag 'v0_3_1'.v0_3_1
svn path=/tags/v0_3_1/; revision=11683
Diffstat (limited to 'devel-docs/query/virtual-folder-in-depth.txt')
-rw-r--r-- | devel-docs/query/virtual-folder-in-depth.txt | 309 |
1 files changed, 0 insertions, 309 deletions
diff --git a/devel-docs/query/virtual-folder-in-depth.txt b/devel-docs/query/virtual-folder-in-depth.txt deleted file mode 100644 index 01718a5f05..0000000000 --- a/devel-docs/query/virtual-folder-in-depth.txt +++ /dev/null @@ -1,309 +0,0 @@ -TITLE: An in-depth look at the virtual folder mechanism -AUTHOR: Giao Nguyen <grail@cafebabe.org> - -* introduction - -This document describes a different way of approaching mail -organization and how all things are possible in this brave new -world. This document does not describe physical storage issues nor -interface issues. - -Historically mail has been organized into folders. These folders -usually mapped to a single storage medium. The relationship between -mail organization and storage medium was one to one. There was one -mail organization for every storage medium. This scheme had its -limitations. - -Efforts at categorizations are only meaningful at the instance that -one categorized. To find any piece of data, regardless of how well -it was categorized, required some amount of searching. Therefore, any -attempts to nullify searching is doomed to fail. It's time to embrace -searching as a way of life. - -These are the terms and their definitions. The example rules used are -based on the syntax for VM (http://www.wonderworks.com/vm/) by Kyle -Jones whose ideas form the basis for this. I'm only adding the -existence of summary files to aid in scaling. I currently use VM and -it's virtual-folder rules for my daily mail purposes. To date, my only -complaints are speed (it has no caches) and for the unitiated, it's -not very user-friendly. - -Comments, questions, rants, etc. should be directed at Giao Nguyen -<grail@cafebabe.org> who will try to address issues in a timely -manner. - -* Definitions - -** store - -A location where mail can be found. This may be a file (Berkeley -mbox), directory (MH), IMAP server, POP3 server, Exchange server, -Lotus Notes server, a stack of Post-Its by your monitor fed through -some OCR system. - -** message - -An individual mail message. - -** vfolder - -A group of messages sharing some commonality. This is the result of a -query. The vfolder maybe contained in a store, but it is not necessary -that a store holds only one vfolder. There is always an implicit -vfolder rule which matches all messages. A store contains the vfolder -which is the result of the query (any). It's short for virtual folder -or maybe view folder. I dunno. - -** default-vfolder - -The vfolder defined by (any) applied to the store. This is not the -inbox. The inbox could easily be defined by a query. A default rule -for the inbox could be (new) but it doesn't have to be. Mine happens -to be (or (unread) (new)). - -** folder - -The classical mail folder approach: one message organization per -store. - -** query - -A search for messages. The result of this is a vfolder. There are two -kinds of queries: named queries and lambda queries. More on this -later. - -** summary file - -An external file that contains pointers to messages which are matches -for a named query. In addition to pointers, the summary file should -also contain signatures of the store for sanity checks. When the term -"index" is used as a verb, it means to build a summary file for a -given name-value pair. - -* Queries - -Named queries are analogous to classical mail folders. Because named -queries maybe reused, summary files are kept as caches to reduce -the overall cost of viewing a vfolder. Summary files are superior to -folders in that they allow for the same messages to appear in multiple -vfolders without message duplications. Duplications of messages -defeats attempts at tagging a message with additional user information -like annotations. Named queries will define folders. - -Lambda queries are similar to named queries except that they have no -name. These are created on the fly by the user to filter out or -include certain messages. - -All queries can be layered on top of each other. A lambda query can be -layered on a named query and a named query can be layered on a lambda -query. The possibilities are endless. - -The layerings can be done as boolean operations (and, or, not). Short -circuiting should be used. - -Examples: - -(and (author "Giao") - (unread)) - -The (unread) query should only be evaluated on the results of (author -"Giao"). - -(or (author "Giao") - (unread)) - -Both of these queries should be evaluated. Any matches are added to the -resulting vfolder. - -* Summary files - -Summary files are only meaningful when applied to the context of the -default-vfolder of a store. - -Summary files should be generated for queries of the form: - -(function "constant value") - -Summary files should never be generated for queries of the form: - -(function (function1)) - -(and (function "value") - (another-function "another value")) - -Given a query of the form: - -(and (function "value") - (another-function "another value")) - -The system should use one summary file for (function "value") and -another summary file for (another-function "another value"). I will -call the prior form the "plain form". - -It should be noted that the signature of the store should be based on -the assumption that new data may have been added to the store since -the application generated the summary file. Signatures generated on -the entirety of the store will most likely be meaningless for things -like POP/IMAP servers. - -* Incremental indexing - -When new messages are detected, all known queries should be evaluated -on the new messages. vfolders should be notified of new messages that -are positive matches for their queries. The indexes generated by this -process should be merged into the current indexes for the vfolder. - -* Can I have multiple stores? - -I don't see why not. Again, the inbox is a vfolder so you can get a -unified inbox consisting of all new mail sent to all your stores or -your can get inboxes for each store or any combination your heart -desire. You get your cake, eat it, and someone else cleans the dishes! - -* Why all this? - -Consider the dynamic nature of the following query: - -(and (author "Giao") - (sent-after (today-midnight))) - -today-midnight would be a function that is evaluated at run-time to -calculate the appropriate object. - -* Scenarios of usage and their solutions - -** Mesage alterations - -This is a fuzzy area that should be left to the UI to handle. Messages -are altered. Read status are altered when a new message is read for -example. How do we handle this if our query is for unread messages? -Upon viewing the state would change. - -One idea is to not evaluate the queries unless we're changing between -vfolder views. This assumes that one can only view a particular -vfolder at a time. For multi-vfolder viewing, a message change should -propagate through the vfolder system. Certain effects (as in our -example) would not be intuitive. - -It would not be a clean solution to make special cases but they may be -necessary where certain defined fields are ignored when they are -changed. Some combination of the above rules can be used. I don't -think it's an easy solution. - -** Message inclusion and exclusion - -Messages are included and excluded also with queries. The final query -will have the form of: - -(and (author "Giao") - (criteria value) - (not (criteria other-value))) - -Userland criterias may be a label of some sort. These may be userland -labels or Message-IDs. What are the performance issues involved in -this? With short circuiting, it's not a major problem. - -The criterias and values are determined by the UI. The vfolder -mechanism isn't concerned with such issues. - -Messages can be included and excluded at will. The idea is often -called "arbitrary inclusion/exclusion". This can be done by -Message-IDs or other fields. It's been noted that Message-IDs are not -unique. - -I propose that any given vfolder is allocated an inclusion label and an -exclusion label. These should be randomly generated. This should be -part of the vfolder description. It should be noted that the vfolder -description has not been drafted yet. - -The result is such that the rules for a given named query is: - -(and (user-query) - (label inclusion-label) - (not exclusion-label)) - -** Query scheduling - -Consider the following extremely dynamic queries: - -A: -(and (author "Giao") - (sent-after (today-midnight))) - -B: -(and (sent-after (today-midnight)) - (author "Giao")) - -C: -(or (author "Giao") - (sent-after (today-midnight))) - -Query A would be significantly faster because (author "Giao") is not -dynamic. A summary file could be generated for this query. Query B is -slow and can be optimized if there was a query compiler of some -sort. Query C demonstrates a query in which there is no good -optimization which can be applied. These come with a certain amount of -baggage. - -It seems then that for boolean 'and' operations, plain forms should be -moved forward and other queries should be moved such that they are -evaluated later. I would expect that the majority of queries would be -of the plain form. - -First is that the summary file is tied to the query and the store -where the query originates from. Second, a hashing function for -strings needs to be calculated for the query so that the query and the -summary file can be associated. This hashing function could be similar -to the hashing function described in Rob Pike's "The Practice of -Programming". (FIXME: Stick page number here) - -** Archives - -Many people are concerned that archives won't be preserved, archives -aren't supported, and many other archive related issues. This is the -short version. - -Archives are just that, archives. Archives are stores. Take your -vfolder, export it to a store. You are done. If you load up the store -again, then the default-vfolder of that store is the view of the -vfolder, except the query is different. - -The point to vfolder is not to do away with classical folder -representation but to move the queries to the front where it would -make data management easier for people who don't think in terms of -files but in terms of queries because ordinary people don't think in -terms of files. - -* Miscellany - -** Annotations - -There should be a scheme to add annotations to messages. Common mail -user agents have used a tag in the message header to mark messages as -read/unread for example. Extending on this we have the ability to add -our own data to a message to add meaning to it. If we have a good -scheme for doing this, new possibilities are opened. - -*** Keywords - -When sending a message, a message could have certain keywords attached -to it. While this can be done with the subject line, the subject line -has a tendency to be munged by other mail applications. One popular -example is the "[rR]e:" prefix. Using the subject line also breaks the -"contract" with other mail user agents. Using keywords in another -field in the message header allows the sender to assist the recipient -in organizing data automatically. Note that the sender can only -provide hints as the sender is unlikely to know the organization -schemes of the recipient. - -** Scope - -Let us assume that we have multiple stores. Does a query work on a -given store? Or does it work on all stores? Or is it configurable such -that a query can work on a user-selected list of stores? - -* Alternatives to the above - -Jim Meyer <purp@selequa.com> is putting some notes on where -annotations needs to be located. They'll be located here as well as -any contributions I may have to them. |