diff options
author | nobody <nobody@localhost> | 2000-07-06 12:58:12 +0800 |
---|---|---|
committer | nobody <nobody@localhost> | 2000-07-06 12:58:12 +0800 |
commit | 5e7d23492183857392e728cec05521e320ba1b29 (patch) | |
tree | 4ba00fa4856e39912c9a54ecba72de80d51d2418 /libibex/TODO | |
parent | 04f148f617112009091abd18b16463033dd322d0 (diff) | |
download | gsoc2013-evolution-GTKHTML_0_5.tar gsoc2013-evolution-GTKHTML_0_5.tar.gz gsoc2013-evolution-GTKHTML_0_5.tar.bz2 gsoc2013-evolution-GTKHTML_0_5.tar.lz gsoc2013-evolution-GTKHTML_0_5.tar.xz gsoc2013-evolution-GTKHTML_0_5.tar.zst gsoc2013-evolution-GTKHTML_0_5.zip |
This commit was manufactured by cvs2svn to create tag 'GTKHTML_0_5'.GTKHTML_0_5
svn path=/tags/GTKHTML_0_5/; revision=3912
Diffstat (limited to 'libibex/TODO')
-rw-r--r-- | libibex/TODO | 61 |
1 files changed, 0 insertions, 61 deletions
diff --git a/libibex/TODO b/libibex/TODO deleted file mode 100644 index a087c8d1f3..0000000000 --- a/libibex/TODO +++ /dev/null @@ -1,61 +0,0 @@ -Stability ---------- -* ibex_open should never crash, and should never return NULL without -errno being set. Should check for errors when reading. - - -Performance ------------ -* Profiling, keep thinking about data structures, etc. - -* Check memory usage - -* See if writing the "inverse image" of long ref streams helps -compression without hurting performance now. (ie, if a word appears in -more than half of the files, write out the list of files it _doesn't_ -appear in). (I tried this before, and it wasn't working well, but the -file format and data structures have changed a lot.) - -* We could save a noticeable chunk of time if normalize_word computed -the hash of the word and then we could pass that into -g_hash_table_insert somehow. - -* Make a copy of the buffer to be indexed (or provide interface for -caller to say ibex can munge the provided data) and then use that -rather than constantly copying things. ? - - -Functionality -------------- -* ibex file locking - -* specify file mode in ibex_open - -* ibex_find* need to normalize the search words... should this be done -by the caller or by ibex_find? - -* Needs to be some way to do a secondary search after getting results -back from ibex_find* (ie, for "foo near bar"). This either has to be -done by ibex, or requires us to export the normalize interface. - -* Does there need to be an ibex_find_any, or is that easy enough for the -caller to do? - -* utf8_trans needs to cover at least two more code pages. This is -tricky because it's not clear whether some of the letters there should -be translated to ASCII or left as UTF8. This requires some -investigation. - -* ibex_index_* need to ignore HTML tags. - NAME = [A-Za-z][A-Za-z0-9.-]* - </?{NAME}(\s*{NAME}(\s*=\s*({NAME}|"[^"]*"|'[^']*')))*> - <!(--([^-]*|-[^-])--\s*)*> - - ugh. ok, simplifying, we get: - <[^!](([^"'>]*("[^"]*"|'[^']*'))*> or - <!(--([^-]*|-[^-])--\s*)*> - - which is still not simple. sigh. - -* ibex_index_* need to recognize and ignore "non-text". Particularly -BinHex and uuencoding. |