From a662f8dc5458e376f173a7caeae927b0e617423d Mon Sep 17 00:00:00 2001
From: nobody <nobody@localhost>
Date: Tue, 22 Aug 2000 12:05:30 +0000
Subject: This commit was manufactured by cvs2svn to create tag 'BONOBO_0_26'.

svn path=/tags/BONOBO_0_26/; revision=4926
---
 devel-docs/query/virtual-folder-in-depth.sgml | 407 --------------------------
 1 file changed, 407 deletions(-)
 delete mode 100644 devel-docs/query/virtual-folder-in-depth.sgml

(limited to 'devel-docs/query/virtual-folder-in-depth.sgml')
diff --git a/devel-docs/query/virtual-folder-in-depth.sgml b/devel-docs/query/virtual-folder-in-depth.sgml
deleted file mode 100644
index d3e3e0504b..0000000000
--- a/devel-docs/query/virtual-folder-in-depth.sgml
+++ /dev/null
@@ -1,407 +0,0 @@
-<!doctype article PUBLIC "-//Davenport//DTD DocBook V3.0//EN" []>
-
-<!-- SGMLized by Bertrand <Bertrand.Guiheneuf@aful.org> -->
-
-<article id="index">
-  <artheader>
-    <authorgroup>
-      <author>
-	<firstname>Giao</firstname>
-	<surname>Nguyen</surname>
-      </author>
-    </authorgroup>
-
-    <title>An in-depth look at the virtual folder mechanism</title>
-    <abstract>
-      <para>
-	This document describes a different way of approaching mail
-	organization and how all things are possible in this brave new
-	world. This document does not describe physical storage issues
-	nor interface issues.
-      </para>
-      <para>
-	Historically mail has been organized into folders. These
-	folders usually mapped to a single storage medium. The
-	relationship between mail organization and storage medium was
-	one to one. There was one mail organization for every storage
-	medium. This scheme had its limitations.
-      </para>
-      <para>  
-	Efforts at categorizations are only meaningful at the instance that
-	one categorized. To find any piece of data, regardless of how well
-	it was categorized, required some amount of searching. Therefore, any
-	attempts to nullify searching is doomed to fail. It's time to embrace
-	searching as a way of life.
-      </para>
-      <para>  
-	These are the terms and their definitions. The example rules used are
-	based on the syntax for VM (http://www.wonderworks.com/vm/) by Kyle
-	Jones whose ideas form the basis for this. I'm only adding the
-	existence of summary files to aid in scaling. I currently use VM and
-	it's virtual-folder rules for my daily mail purposes. To date, my only 
-	complaints are speed (it has no caches) and for the unitiated, it's
-	not very user-friendly.
-      </para>
-      <para>  
-	Comments, questions, rants, etc. should be directed at Giao Nguyen
-	(grail@cafebabe.org) who will try to address issues in a timely
-	manner.
-      </para>
-    </abstract>
-  </artheader>
-
-  <!-- Definitions -->
-  <sect1 id="definitions">
-    <title>Definitions</title>
-    <sect2>
-      <title>Store</title> 
-      <para>
-	A location where mail can be found. This may be a file (Berkeley
-	mbox), directory (MH), IMAP server, POP3 server, Exchange server,
-	Lotus Notes server, a stack of Post-Its by your monitor fed through
-	some OCR system.
-      </para>
-    </sect2>
-
-    <sect2>
-      <title>Message</title> 
-      <para>  
-	An individual mail message.
-      </para>
-    </sect2>
-    <sect2>
-      <title>Vfolder</title> 
-      <para>  
-	A group of messages sharing some commonality. This is the result of a
-	query. The vfolder maybe contained in a store, but it is not necessary
-	that a store holds only one vfolder. There is always an implicit
-	vfolder rule which matches all messages. A store contains the vfolder
-	which is the result of the query (any). It's short for virtual folder
-	or maybe view folder. I dunno.
-      </para>
-    </sect2>
-    <sect2>
-      <title>Default-vfolder</title> 
-      <para>  
-	The vfolder defined by (any) applied to the store. This is not the
-	inbox. The inbox could easily be defined by a query. A default rule
-	for the inbox could be (new) but it doesn't have to be. Mine happens
-	to be (or (unread) (new)).
-      </para>
-    </sect2>
-    <sect2>
-      <title>Folder</title> 
-      <para>  
-	The classical mail folder approach: one message organization per
-	store.
-      </para>
-    </sect2>
-    <sect2>
-      <title>Query</title> 
-      <para>  
-	A search for messages. The result of this is a vfolder. There are two
-	kinds of queries: named queries and lambda queries. More on this
-	later.
-      </para>
-    </sect2>
-    <sect2>
-      <title>Summary file </title> 
-      <para>  
-	An external file that contains pointers to messages which are matches
-	for a named query. In addition to pointers, the summary file should
-	also contain signatures of the store for sanity checks. When the term
-	"index" is used as a verb, it means to build a summary file for a
-	given name-value pair.
-      </para>
-    </sect2>
-  </sect1>
-
-  <!-- Queries -->
-  <sect1>
-    <title>Queries</title> 
-    <para>  
-      Named queries are analogous to classical mail folders. Because named
-      queries maybe reused, summary files are kept as caches to reduce
-      the overall cost of viewing a vfolder. Summary files are superior to
-      folders in that they allow for the same messages to appear in multiple
-      vfolders without message duplications. Duplications of messages
-      defeats attempts at tagging a message with additional user information
-      like annotations. Named queries will define folders.
-    </para>
-    <para>
-      Lambda queries are similar to named queries except that they have no
-      name. These are created on the fly by the user to filter out or
-      include certain messages.
-    </para>
-    <para>
-      All queries can be layered on top of each other. A lambda query can be 
-      layered on a named query and a named query can be layered on a lambda
-      query. The possibilities are endless.
-    </para>
-    <para>
-      The layerings can be done as boolean operations (and, or, not). Short
-      circuiting should be used. 
-    </para>
-    <para>
-      Examples:
-      <programlisting>
-(and (author "Giao")
-  (unread))
-      </programlisting>
-      The (unread) query should only be evaluated on the results of (author
-      "Giao").
-      <programlisting>
-(or (author "Giao")
-  (unread))
-      </programlisting>
-      Both of these queries should be evaluated. Any matches are added to the
-      resulting vfolder.
-    </para>
-  </sect1>
-
-  <!-- Summary files -->
-  <sect1>
-    <title>Summary files</title> 
-    <para>    
-      Summary files are only meaningful when applied to the context of the
-      default-vfolder of a store.
-    </para>
-    <para>
-      Summary files should be generated for queries of the form:
-      <programlisting>
-(function "constant value")
-      </programlisting>
-      Summary files should never be generated for queries of the form:
-      <programlisting>
-	(function (function1))
-	
-	(and (function "value")
-	(another-function "another value"))
-      </programlisting>
-      Given a query of the form:
-      <programlisting>
-	(and (function "value")
-	(another-function "another value"))
-      </programlisting>
-      The system should use one summary file for (function "value") and
-      another summary file for (another-function "another value"). I will
-      call the prior form the "plain form".
-    </para>
-    <para>
-      It should be noted that the signature of the store should be based on
-      the assumption that new data may have been added to the store since
-      the application generated the summary file. Signatures generated on
-      the entirety of the store will most likely be meaningless for things
-      like POP/IMAP servers. 
-    </para>
-  </sect1>
-
-  <!-- Incremental Indexing -->
-  <sect1>
-    <title>Incremental indexing</title> 
-    <para>
-      When new messages are detected, all known queries should be evaluated
-      on the new messages. vfolders should be notified of new messages that
-      are positive matches for their queries. The indexes generated by this
-      process should be merged into the current indexes for the vfolder.
-    </para>
-  </sect1>
-
-  <!-- Can I have multiple stores -->
-  <sect1>
-    <title>Can I have multiple stores?</title> 
-    <para> 
-      I don't see why not. Again, the inbox is a vfolder so you can get a
-      unified inbox consisting of all new mail sent to all your stores or
-      your can get inboxes for each store or any combination your heart
-      desire. You get your cake, eat it, and someone else cleans the dishes!
-    </para>
-  </sect1>
-
-  <!-- Why all this? -->
-  <sect1>
-    <title>Why all this?</title> 
-    <para> 
-      Consider the dynamic nature of the following query:
-      <programlisting>
-(and (author "Giao")
-  (sent-after (today-midnight)))
-      </programlisting>
-      today-midnight would be a function that is evaluated at run-time to
-      calculate the appropriate object.
-    </para>
-  </sect1>
-
-  <!-- Scenarios of usage and their solutions -->
-  <sect1>
-    <title>Scenarios of usage and their solutions</title> 
-    <sect2>
-      <title>Mesage alterations</title>
-      <para>
-	This is a fuzzy area that should be left to the UI to handle. Messages 
-	are altered. Read status are altered when a new message is read for
-	example. How do we handle this if our query is for unread messages?
-	Upon viewing the state would change.
-      </para>
-      <para>
-	One idea is to not evaluate the queries unless we're changing between
-	vfolder views. This assumes that one can only view a particular
-	vfolder at a time. For multi-vfolder viewing, a message change should
-	propagate through the vfolder system. Certain effects (as in our
-	example) would not be intuitive.
-      </para>
-      <para> 
-	It would not be a clean solution to make special cases but they may be 
-	necessary where certain defined fields are ignored when they are
-	changed. Some combination of the above rules can be used. I don't
-	think it's an easy solution.
-      </para>
-    </sect2>
-    <sect2>
-      <title>Message inclusion and exclusion</title>
-      <para>
-	Messages are included and excluded also with queries. The final query
-	will have the form of:
-	<programlisting>
-	  (and (author "Giao")
-	  (criteria value)
-	  (not (criteria other-value)))
-	</programlisting>
-	Userland criterias may be a label of some sort. These may be userland
-	labels or Message-IDs. What are the performance issues involved in
-	this? With short circuiting, it's not a major problem.
-      </para>
-      <para>    
-	The criterias and values are determined by the UI. The vfolder
-	mechanism isn't concerned with such issues.
-      </para>
-      <para>   
-	Messages can be included and excluded at will. The idea is often
-	called "arbitrary inclusion/exclusion". This can be done by
-	Message-IDs or other fields. It's been noted that Message-IDs are not
-	unique. 
-      </para>
-      <para>  
-	I propose that any given vfolder is allocated an inclusion label and an 
-	exclusion label. These should be randomly generated. This should be
-	part of the vfolder description. It should be noted that the vfolder
-	description has not been drafted yet.
-      </para>
-      <para>   
-	The result is such that the rules for a given named query is:
-	<programlisting>
-	  (and (user-query)
-	  (label inclusion-label)
-	  (not exclusion-label))
-	</programlisting>
-      </para>
-    </sect2>
-    <sect2>
-      <title>Query scheduling</title>
-      <para>
-	Consider the following extremely dynamic queries:
-	<programlisting>
-	  A:
-	  (and (author "Giao")
-	  (sent-after (today-midnight)))
-	  
-	  B:
-	  (and (sent-after (today-midnight))
-	  (author "Giao"))
-	  
-	  C:
-	  (or (author "Giao")
-	  (sent-after (today-midnight)))
-	</programlisting>
-	Query A would be significantly faster because (author "Giao") is not
-	dynamic. A summary file could be generated for this query. Query B is
-	slow and can be optimized if there was a query compiler of some
-	sort. Query C demonstrates a query in which there is no good
-	optimization which can be applied. These come with a certain amount of
-	baggage.
-      </para>
-      <para>
-	It seems then that for boolean 'and' operations, plain forms should be 
-	moved forward and other queries should be moved such that they are
-	evaluated later. I would expect that the majority of queries would be
-	of the plain form.
-      </para>
-      <para>  
-	First is that the summary file is tied to the query and the store
-	where the query originates from. Second, a hashing function for
-	strings needs to be calculated for the query so that the query and the 
-	summary file can be associated. This hashing function could be similar 
-	to the hashing function described in Rob Pike's "The Practice of
-	Programming". (FIXME: Stick page number here)
-      </para>
-    </sect2>
-    <sect2>
-      <title>Archives</title>
-      <para>
-	Many people are concerned that archives won't be preserved, archives
-	aren't supported, and many other archive related issues. This is the
-	short version.
-      </para>
-      <para>    
-	Archives are just that, archives. Archives are stores. Take your
-	vfolder, export it to a store. You are done. If you load up the store
-	again, then the default-vfolder of that store is the view of the
-	vfolder, except the query is different.
-      </para>
-      <para>    
-	The point to vfolder is not to do away with classical folder
-	representation but to move the queries to the front where it would
-	make data management easier for people who don't think in terms of
-	files but in terms of queries because ordinary people don't think in
-	terms of files.
-      </para>    
-    </sect2>
-  </sect1>
-
-  <!-- Miscellany -->
-  <sect1>
-    <title>Miscellany</title>
-    <sect2>
-      <title>Annotations</title>
-      <para>
-	There should be a scheme to add annotations to messages. Common mail
-	user agents have used a tag in the message header to mark messages as
-	read/unread for example. Extending on this we have the ability to add
-	our own data to a message to add meaning to it. If we have a good
-	scheme for doing this, new possibilities are opened.
-      </para>
-      <sect3>
-	<title>Keywords</title>  
-	<para>
-	  When sending a message, a message could have certain keywords attached 
-	  to it. While this can be done with the subject line, the subject line
-	  has a tendency to be munged by other mail applications. One popular
-	  example is the "[rR]e:" prefix. Using the subject line also breaks the 
-	  "contract" with other mail user agents. Using keywords in another
-	  field in the message header allows the sender to assist the recipient
-	  in organizing data automatically. Note that the sender can only
-	  provide hints as the sender is unlikely to know the organization
-	  schemes of the recipient.
-	</para>
-      </sect3>
-    </sect2>
-    <sect2>
-      <title>Scope</title>  
-      <para>
-	Let us assume that we have multiple stores. Does a query work on a
-	given store? Or does it work on all stores? Or is it configurable such 
-	that a query can work on a user-selected list of stores?
-      </para>
-    </sect2>
-  </sect1>
-
-  <!-- Alternatives to the above -->
-  <sect1>
-    <title>Alternatives to the above</title>
-    <para>
-      Jim Meyer (purp@selequa.com) is putting some notes on where
-      annotations needs to be located. They'll be located here as well as
-      any contributions I may have to them.
-    </para>
-  </sect1>
-</article>
-- 
cgit v1.2.3