1 files changed, 395 insertions, 0 deletions
diff --git a/devel-docs/query/virtual-folder-in-depth.sgml b/devel-docs/query/virtual-folder-in-depth.sgml
new file mode 100644
index 0000000000..fc85132673
--- /dev/null
+++ b/devel-docs/query/virtual-folder-in-depth.sgml
@@ -0,0 +1,395 @@
+<!doctype article PUBLIC "-//Davenport//DTD DocBook V3.0//EN" []>
+
+<!-- SGMLized by Bertrand <Bertrand.Guiheneuf@inria.fr> -->
+
+<article id="index">
+ <artheader>
+ <authorgroup>
+   <author>
+    <firstname>Giao</firstname>
+    <surname>Nguyen</surname>
+    </author>
+   </authorgroup>
+ <title>An in-depth look at the virtual folder mechanism</title>
+  <abstract>
+  <para>
+  This document describes a different way of approaching mail
+  organization and how all things are possible in this brave new
+  world. This document does not describe physical storage issues nor
+  interface issues.
+  </para>
+  <para>
+  Historically mail has been organized into folders. These folders
+  usually mapped to a single storage medium. The relationship between
+  mail organization and storage medium was one to one. There was one
+  mail organization for every storage medium. This scheme had its
+  limitations.
+  </para>
+  <para>  
+  Efforts at categorizations are only meaningful at the instance that
+  one categorized. To find any piece of data, regardless of how well
+  it was categorized, required some amount of searching. Therefore, any
+  attempts to nullify searching is doomed to fail. It's time to embrace
+  searching as a way of life.
+  </para>
+  <para>  
+  These are the terms and their definitions. The example rules used are
+  based on the syntax for VM (http://www.wonderworks.com/vm/) by Kyle
+  Jones whose ideas form the basis for this. I'm only adding the
+  existence of summary files to aid in scaling. I currently use VM and
+  it's virtual-folder rules for my daily mail purposes. To date, my only 
+  complaints are speed (it has no caches) and for the unitiated, it's
+  not very user-friendly.
+  </para>
+  <para>  
+  Comments, questions, rants, etc. should be directed at Giao Nguyen
+  (grail@cafebabe.org) who will try to address issues in a timely
+  manner.
+  </para>
+ </abstract>
+</artheader>
+ <sect1 id="definitions">
+  <title>Definitions</title>
+  <sect2>
+   <title>Store</title> 
+   <para>
+   A location where mail can be found. This may be a file (Berkeley
+   mbox), directory (MH), IMAP server, POP3 server, Exchange server,
+   Lotus Notes server, a stack of Post-Its by your monitor fed through
+   some OCR system.
+   </para>
+  </sect2>
+  <sect2>
+   <title>Message</title> 
+   <para>  
+   An individual mail message.
+   </para>
+  </sect2>
+  <sect2>
+   <title>Vfolder</title> 
+   <para>  
+   A group of messages sharing some commonality. This is the result of a
+   query. The vfolder maybe contained in a store, but it is not necessary
+   that a store holds only one vfolder. There is always an implicit
+   vfolder rule which matches all messages. A store contains the vfolder
+   which is the result of the query (any). It's short for virtual folder
+   or maybe view folder. I dunno.
+   </para>
+  </sect2>
+  <sect2>
+   <title>Default-vfolder</title> 
+   <para>  
+   The vfolder defined by (any) applied to the store. This is not the
+   inbox. The inbox could easily be defined by a query. A default rule
+   for the inbox could be (new) but it doesn't have to be. Mine happens
+   to be (or (unread) (new)).
+   </para>
+  </sect2>
+  <sect2>
+   <title>Folder</title> 
+   <para>  
+   The classical mail folder approach: one message organization per
+   store.
+   </para>
+  </sect2>
+  <sect2>
+   <title>Query</title> 
+   <para>  
+   A search for messages. The result of this is a vfolder. There are two
+   kinds of queries: named queries and lambda queries. More on this
+   later.
+   </para>
+  </sect2>
+  <sect2>
+   <title>Summary file </title> 
+   <para>  
+   An external file that contains pointers to messages which are matches
+   for a named query. In addition to pointers, the summary file should
+   also contain signatures of the store for sanity checks. When the term
+   "index" is used as a verb, it means to build a summary file for a
+   given name-value pair.
+   </para>
+  </sect2>
+ </sect1>
+
+ <sect1>
+  <title>Queries</title> 
+  <para>  
+  Named queries are analogous to classical mail folders. Because named
+  queries maybe reused, summary files are kept as caches to reduce
+  the overall cost of viewing a vfolder. Summary files are superior to
+  folders in that they allow for the same messages to appear in multiple
+  vfolders without message duplications. Duplications of messages
+  defeats attempts at tagging a message with additional user information
+  like annotations. Named queries will define folders.
+  </para>
+  <para>
+  Lambda queries are similar to named queries except that they have no
+  name. These are created on the fly by the user to filter out or
+  include certain messages.
+  </para>
+  <para>
+  All queries can be layered on top of each other. A lambda query can be 
+  layered on a named query and a named query can be layered on a lambda
+  query. The possibilities are endless.
+  </para>
+  <para>
+  The layerings can be done as boolean operations (and, or, not). Short
+  circuiting should be used. 
+  </para>
+  <para>
+  Examples:
+  <programlisting>
+  (and (author "Giao")
+       (unread))
+  </programlisting>
+  The (unread) query should only be evaluated on the results of (author
+  "Giao").
+  <programlisting>
+  (or (author "Giao")
+      (unread))
+  </programlisting>
+  Both of these queries should be evaluated. Any matches are added to the
+  resulting vfolder.
+  </para>
+ </sect1>
+ 
+ <sect1>
+  <title>Summary files</title> 
+  <para>    
+  Summary files are only meaningful when applied to the context of the
+  default-vfolder of a store.
+  </para>
+  <para>
+  Summary files should be generated for queries of the form:
+  <programlisting>
+  (function "constant value")
+  </programlisting>
+  Summary files should never be generated for queries of the form:
+  <programlisting>
+  (function (function1))
+  
+  (and (function "value")
+       (another-function "another value"))
+  </programlisting>
+  Given a query of the form:
+  <programlisting>
+  (and (function "value")
+       (another-function "another value"))
+  </programlisting>
+  The system should use one summary file for (function "value") and
+  another summary file for (another-function "another value"). I will
+  call the prior form the "plain form".
+  </para>
+  <para>
+  It should be noted that the signature of the store should be based on
+  the assumption that new data may have been added to the store since
+  the application generated the summary file. Signatures generated on
+  the entirety of the store will most likely be meaningless for things
+  like POP/IMAP servers. 
+  </para>
+ </sect1>
+
+ <sect1>
+  <title>Incremental indexing</title> 
+  <para>
+  When new messages are detected, all known queries should be evaluated
+  on the new messages. vfolders should be notified of new messages that
+  are positive matches for their queries. The indexes generated by this
+  process should be merged into the current indexes for the vfolder.
+  </para>
+ </sect1>
+
+ <sect1>
+  <title>Can I have multiple stores?</title> 
+  <para> 
+  I don't see why not. Again, the inbox is a vfolder so you can get a
+  unified inbox consisting of all new mail sent to all your stores or
+  your can get inboxes for each store or any combination your heart
+  desire. You get your cake, eat it, and someone else cleans the dishes!
+  </para>
+ </sect1>
+
+ <sect1>
+  <title>Why all this?</title> 
+  <para> 
+  Consider the dynamic nature of the following query:
+  <programlisting>
+  (and (author "Giao")
+       (sent-after (today-midnight)))
+  </programlisting>
+  today-midnight would be a function that is evaluated at run-time to
+  calculate the appropriate object.
+  </para>
+ </sect1>
+
+ <sect1>
+  <title>Scenarios of usage and their solutions</title> 
+  <sect2>
+   <title>Mesage alterations</title>
+   <para>
+   This is a fuzzy area that should be left to the UI to handle. Messages 
+   are altered. Read status are altered when a new message is read for
+   example. How do we handle this if our query is for unread messages?
+   Upon viewing the state would change.
+   </para>
+   <para>
+   One idea is to not evaluate the queries unless we're changing between
+   vfolder views. This assumes that one can only view a particular
+   vfolder at a time. For multi-vfolder viewing, a message change should
+   propagate through the vfolder system. Certain effects (as in our
+   example) would not be intuitive.
+   </para>
+   <para> 
+   It would not be a clean solution to make special cases but they may be 
+   necessary where certain defined fields are ignored when they are
+   changed. Some combination of the above rules can be used. I don't
+   think it's an easy solution.
+   </para>
+  </sect2>
+  <sect2>
+   <title>Message inclusion and exclusion</title>
+   <para>
+   Messages are included and excluded also with queries. The final query
+   will have the form of:
+   <programlisting>
+   (and (author "Giao")
+        (criteria value)
+        (not (criteria other-value)))
+   </programlisting>
+   Userland criterias may be a label of some sort. These may be userland
+   labels or Message-IDs. What are the performance issues involved in
+   this? With short circuiting, it's not a major problem.
+   </para>
+   <para>    
+   The criterias and values are determined by the UI. The vfolder
+   mechanism isn't concerned with such issues.
+   </para>
+   <para>   
+   Messages can be included and excluded at will. The idea is often
+   called "arbitrary inclusion/exclusion". This can be done by
+   Message-IDs or other fields. It's been noted that Message-IDs are not
+   unique. 
+   </para>
+   <para>  
+   I propose that any given vfolder is allocated an inclusion label and an 
+   exclusion label. These should be randomly generated. This should be
+   part of the vfolder description. It should be noted that the vfolder
+   description has not been drafted yet.
+   </para>
+   <para>   
+   The result is such that the rules for a given named query is:
+   <programlisting>
+   (and (user-query)
+        (label inclusion-label)
+        (not exclusion-label))
+   </programlisting>
+   </para>
+  </sect2>
+  <sect2>
+   <title>Query scheduling</title>
+   <para>
+   Consider the following extremely dynamic queries:
+   <programlisting>
+   A:
+   (and (author "Giao")
+        (sent-after (today-midnight)))
+   
+   B:
+   (and (sent-after (today-midnight))
+        (author "Giao"))
+   
+   C:
+   (or (author "Giao")
+       (sent-after (today-midnight)))
+   </programlisting>
+   Query A would be significantly faster because (author "Giao") is not
+   dynamic. A summary file could be generated for this query. Query B is
+   slow and can be optimized if there was a query compiler of some
+   sort. Query C demonstrates a query in which there is no good
+   optimization which can be applied. These come with a certain amount of
+   baggage.
+   </para>
+   <para>
+   It seems then that for boolean 'and' operations, plain forms should be 
+   moved forward and other queries should be moved such that they are
+   evaluated later. I would expect that the majority of queries would be
+   of the plain form.
+   </para>
+   <para>  
+   First is that the summary file is tied to the query and the store
+   where the query originates from. Second, a hashing function for
+   strings needs to be calculated for the query so that the query and the 
+   summary file can be associated. This hashing function could be similar 
+   to the hashing function described in Rob Pike's "The Practice of
+   Programming". (FIXME: Stick page number here)
+   </para>
+  </sect2>
+  <sect2>
+   <title>Archives</title>
+   <para>
+   Many people are concerned that archives won't be preserved, archives
+   aren't supported, and many other archive related issues. This is the
+   short version.
+   </para>
+   <para>    
+   Archives are just that, archives. Archives are stores. Take your
+   vfolder, export it to a store. You are done. If you load up the store
+   again, then the default-vfolder of that store is the view of the
+   vfolder, except the query is different.
+   </para>
+   <para>    
+   The point to vfolder is not to do away with classical folder
+   representation but to move the queries to the front where it would
+   make data management easier for people who don't think in terms of
+   files but in terms of queries because ordinary people don't think in
+   terms of files.
+   </para>    
+  </sect2>
+ </sect1>
+
+ <sect1>
+  <title>Miscellany</title>
+  <sect2>
+   <title>Annotations</title>
+   <para>
+   There should be a scheme to add annotations to messages. Common mail
+   user agents have used a tag in the message header to mark messages as
+   read/unread for example. Extending on this we have the ability to add
+   our own data to a message to add meaning to it. If we have a good
+   scheme for doing this, new possibilities are opened.
+   </para>
+   <sect3>
+    <title>Keywords</title>  
+    <para>
+    When sending a message, a message could have certain keywords attached 
+    to it. While this can be done with the subject line, the subject line
+    has a tendency to be munged by other mail applications. One popular
+    example is the "[rR]e:" prefix. Using the subject line also breaks the 
+    "contract" with other mail user agents. Using keywords in another
+    field in the message header allows the sender to assist the recipient
+    in organizing data automatically. Note that the sender can only
+    provide hints as the sender is unlikely to know the organization
+    schemes of the recipient.
+    </para>
+   </sect3>
+  </sect2>
+  <sect2>
+   <title>Scope</title>  
+   <para>
+   Let us assume that we have multiple stores. Does a query work on a
+   given store? Or does it work on all stores? Or is it configurable such 
+   that a query can work on a user-selected list of stores?
+   </para>
+  </sect2>
+ </sect1>
+
+ <sect1>
+  <title>Alternatives to the above</title>
+  <para>
+  Jim Meyer (purp@selequa.com) is putting some notes on where
+  annotations needs to be located. They'll be located here as well as
+  any contributions I may have to them.
+  </para>
+ </sect1>
+</article>