diff options
Diffstat (limited to 'doc/white-papers/mail/ibex.sgml')
-rw-r--r-- | doc/white-papers/mail/ibex.sgml | 158 |
1 files changed, 0 insertions, 158 deletions
diff --git a/doc/white-papers/mail/ibex.sgml b/doc/white-papers/mail/ibex.sgml deleted file mode 100644 index dcb8f5ca4b..0000000000 --- a/doc/white-papers/mail/ibex.sgml +++ /dev/null @@ -1,158 +0,0 @@ -<!doctype article PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [ -<!entity Evolution "<application>Evolution</application>"> -<!entity Camel "Camel"> -<!entity Ibex "Ibex"> -]> - -<article class="whitepaper" id="ibex"> - - <artheader> - <title>Ibex: an Indexing System</title> - - <authorgroup> - <author> - <firstname>Dan</firstname> - <surname>Winship</surname> - <affiliation> - <address> - <email>danw@helixcode.com</email> - </address> - </affiliation> - </author> - </authorgroup> - - <copyright> - <year>2000</year> - <holder>Helix Code, Inc.</holder> - </copyright> - - </artheader> - - <sect1 id="introduction"> - <title>Introduction</title> - - <para> - &Ibex; is a library for text indexing. It is being used by - &Camel; to allow it to quickly search locally-stored messages, - either because the user is looking for a specific piece of text, - or because the application is contructing a vFolder or filtering - incoming mail. - </para> - </sect1> - - <sect1 id="goals"> - <title>Design Goals and Requirements for Ibex</title> - - <para> - The design of &Ibex; is based on a number of requirements. - - <itemizedlist> - <listitem> - <para> - First, obviously, it must be fast. In particular, searching - the index must be appreciably faster than searching through - the messages themselves, and constructing and maintaining - the index must not take a noticeable amount of time. - </para> - </listitem> - - <listitem> - <para> - The indexes must not take up too much space. Many users have - limited filesystem quotas on the systems where they read - their mail, and even users who read mail on private machines - have to worry about running out of space on their disks. The - indexes should be able to do their job without taking up so - much space that the user decides he would be better off - without them. - </para> - - <para> - Another aspect of this problem is that the system as a whole - must be clever about what it does and does not index: - accidentally indexing a "text" mail message containing - uuencoded, BinHexed, or PGP-encrypted data will drastically - affect the size of the index file. Either the caller or the - indexer itself has to avoid trying to index these sorts of - things. - </para> - </listitem> - - <listitem> - <para> - The indexing system must allow data to be added to the index - incrementally, so that new messages can be added to the - index (and deleted messages can be removed from it) without - having to re-scan all existing messages. - </para> - </listitem> - - <listitem> - <para> - It must allow the calling application to explain the - structure of the data however it wants to, rather than - requiring that the unit of indexing be individual files. - This way, &Camel; can index a single mbox-format file and - treat it as multiple messages. - </para> - </listitem> - - <listitem> - <para> - It must support non-ASCII text, given that many people send - and receive non-English email, and even people who only - speak English may receive email from people whose names - cannot be written in the US-ASCII character set. - </para> - </listitem> - </itemizedlist> - - <para> - While there are a number of existing indexing systems, none of - them met all (or even most) of our requirements. - </para> - </sect1> - - <sect1 id="implementation"> - <title>The Implementation</title> - - <para> - &Ibex; is still young, and many of the details of the current - implementation are not yet finalized. - </para> - - <para> - With the current index file format, 13 megabytes of Info files - can be indexed into a 371 kilobyte index file—a bit under - 3% of the original size. This is reasonable, but making it - smaller would be nice. (The file format includes some simple - compression, but <application>gzip</application> can compress an - index file to about half its size, so we can clearly do better.) - </para> - - <para> - The implementation has been profiled and optimized for speed to - some degree. But, it has so far only been run on a 500MHz - Pentium III system with very fast disks, so we have no solid - benchmarks. - </para> - - <para> - Further optimization (of both the file format and the in-memory - data structures) awaits seeing how the library is most easily - used by &Evolution;: if the indexes are likely to be kept in - memory for long periods of time, the in-memory data structures - need to be kept small, but the reading and writing operations - can be slow. On the other hand, if the indexes will only be - opened when they are needed, reading and writing must be fast, - and memory usage is less critical. - </para> - - <para> - Of course, to be useful for other applications that have - indexing needs, the library should provide several options, so - that each application can use the library in the way that is - most suited for its needs. - </para> - </sect1> -</article> |