From 6bf0dab0770162e5eab74dd07f6f15089ea6f92e Mon Sep 17 00:00:00 2001 From: nobody Date: Tue, 22 Aug 2000 12:05:30 +0000 Subject: This commit was manufactured by cvs2svn to create tag 'CONTROL_CENTER_1_3_1'. svn path=/tags/CONTROL_CENTER_1_3_1/; revision=4928 --- doc/white-papers/mail/ibex.sgml | 158 ---------------------------------------- 1 file changed, 158 deletions(-) delete mode 100644 doc/white-papers/mail/ibex.sgml (limited to 'doc/white-papers/mail/ibex.sgml') diff --git a/doc/white-papers/mail/ibex.sgml b/doc/white-papers/mail/ibex.sgml deleted file mode 100644 index dcb8f5ca4b..0000000000 --- a/doc/white-papers/mail/ibex.sgml +++ /dev/null @@ -1,158 +0,0 @@ -Evolution"> - - -]> - -
- - - Ibex: an Indexing System - - - - Dan - Winship - -
- danw@helixcode.com -
-
-
-
- - - 2000 - Helix Code, Inc. - - -
- - - Introduction - - - &Ibex; is a library for text indexing. It is being used by - &Camel; to allow it to quickly search locally-stored messages, - either because the user is looking for a specific piece of text, - or because the application is contructing a vFolder or filtering - incoming mail. - - - - - Design Goals and Requirements for Ibex - - - The design of &Ibex; is based on a number of requirements. - - - - - First, obviously, it must be fast. In particular, searching - the index must be appreciably faster than searching through - the messages themselves, and constructing and maintaining - the index must not take a noticeable amount of time. - - - - - - The indexes must not take up too much space. Many users have - limited filesystem quotas on the systems where they read - their mail, and even users who read mail on private machines - have to worry about running out of space on their disks. The - indexes should be able to do their job without taking up so - much space that the user decides he would be better off - without them. - - - - Another aspect of this problem is that the system as a whole - must be clever about what it does and does not index: - accidentally indexing a "text" mail message containing - uuencoded, BinHexed, or PGP-encrypted data will drastically - affect the size of the index file. Either the caller or the - indexer itself has to avoid trying to index these sorts of - things. - - - - - - The indexing system must allow data to be added to the index - incrementally, so that new messages can be added to the - index (and deleted messages can be removed from it) without - having to re-scan all existing messages. - - - - - - It must allow the calling application to explain the - structure of the data however it wants to, rather than - requiring that the unit of indexing be individual files. - This way, &Camel; can index a single mbox-format file and - treat it as multiple messages. - - - - - - It must support non-ASCII text, given that many people send - and receive non-English email, and even people who only - speak English may receive email from people whose names - cannot be written in the US-ASCII character set. - - - - - - While there are a number of existing indexing systems, none of - them met all (or even most) of our requirements. - - - - - The Implementation - - - &Ibex; is still young, and many of the details of the current - implementation are not yet finalized. - - - - With the current index file format, 13 megabytes of Info files - can be indexed into a 371 kilobyte index file—a bit under - 3% of the original size. This is reasonable, but making it - smaller would be nice. (The file format includes some simple - compression, but gzip can compress an - index file to about half its size, so we can clearly do better.) - - - - The implementation has been profiled and optimized for speed to - some degree. But, it has so far only been run on a 500MHz - Pentium III system with very fast disks, so we have no solid - benchmarks. - - - - Further optimization (of both the file format and the in-memory - data structures) awaits seeing how the library is most easily - used by &Evolution;: if the indexes are likely to be kept in - memory for long periods of time, the in-memory data structures - need to be kept small, but the reading and writing operations - can be slow. On the other hand, if the indexes will only be - opened when they are needed, reading and writing must be fast, - and memory usage is less critical. - - - - Of course, to be useful for other applications that have - indexing needs, the library should provide several options, so - that each application can use the library in the way that is - most suited for its needs. - - -
-- cgit v1.2.3