From mboxrd@z Thu Jan 1 00:00:00 1970 From: wmorgan-sup@masanjin.net (William Morgan) Date: Tue, 28 Jul 2009 12:05:39 -0700 Subject: [sup-talk] xapian question In-Reply-To: <1248795865-sup-6634@pion.club.cc.cmu.edu> References: <1248716325-sup-7534@masanjin.net> <1248795865-sup-6634@pion.club.cc.cmu.edu> Message-ID: <1248807365-sup-4965@masanjin.net> Reformatted excerpts from Rich Lane's message of 2009-07-28: > Xapian actually provides add_term and remove_term for documents. Excellent. > I'd definitely like to use these for label updates, but we need a way > to tell if only the labels have changed in sync_message. I've been running into this same issue with my sup-server experiments, so I think we should split the API into, say, three separate calls: something like add_new_message, update_labels, and update_body. (AFAIK the only client of update_body is in some of the draft editing stuff.) WDYT? > Or, we update the index in Message#add_label/etc and get rid of the > need to save buffers. That might not be an option for the Ferret > index, though. I think that would actually be fine for Ferret, and it's a direction that's often been discussed. (Especially now that we have undo.) If we do the above, we can certainly do this as a later step. > We don't store the body in entries.db, just enough info for > thread-index-mode. It's only about 800 bytes/message for me, but I > don't have snippets enabled so yours would be larger. On second glance, it's a little smaller than I remembered. For my sample 212m mbox, it's about 20m with snippets enabled. The total index size under Xapian (the xapian/ dir and all gdbm files) is larger than the original mbox file, which seems a little insane. But hey, disk space is cheap. -- William