* [sup-talk] xapian question
@ 2009-07-27 17:45 William Morgan
2009-07-28 15:57 ` Rich Lane
0 siblings, 1 reply; 5+ messages in thread
From: William Morgan @ 2009-07-27 17:45 UTC (permalink / raw)
Hey, I finally get to ask a question!
One of the mildly irritating things about Ferret was that it was
impossible to update the labels of a message without updating the entire
entry, i.e. including the body. So updating the labels of a message and
saving that to disk required either re-loading the body from the source,
or keeping the body explicitly in the index so that it could be loaded
without going back to the source.
The latter approach is used by the current Ferret index implementation,
since it's significantly faster (especially for slow sources like IMAP
servers), but at the cost of a lot of disk space.
My understanding of Xapian is that this is also the case, since fields
are essentially represented as prefixed terms, and so you're basically
updating a big blog, but I wanted to confirm this. I ask because the
entries.db file is very big. :)
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] xapian question
2009-07-27 17:45 [sup-talk] xapian question William Morgan
@ 2009-07-28 15:57 ` Rich Lane
2009-07-28 19:05 ` William Morgan
0 siblings, 1 reply; 5+ messages in thread
From: Rich Lane @ 2009-07-28 15:57 UTC (permalink / raw)
Excerpts from William Morgan's message of Mon Jul 27 13:45:32 -0400 2009:
> Hey, I finally get to ask a question!
>
> One of the mildly irritating things about Ferret was that it was
> impossible to update the labels of a message without updating the entire
> entry, i.e. including the body. So updating the labels of a message and
> saving that to disk required either re-loading the body from the source,
> or keeping the body explicitly in the index so that it could be loaded
> without going back to the source.
>
> The latter approach is used by the current Ferret index implementation,
> since it's significantly faster (especially for slow sources like IMAP
> servers), but at the cost of a lot of disk space.
>
> My understanding of Xapian is that this is also the case, since fields
> are essentially represented as prefixed terms, and so you're basically
> updating a big blog, but I wanted to confirm this. I ask because the
> entries.db file is very big. :)
Xapian actually provides add_term and remove_term for documents. I'd
definitely like to use these for label updates, but we need a way to
tell if only the labels have changed in sync_message. Or, we update the
index in Message#add_label/etc and get rid of the need to save buffers.
That might not be an option for the Ferret index, though.
We don't store the body in entries.db, just enough info for
thread-index-mode. It's only about 800 bytes/message for me, but I don't
have snippets enabled so yours would be larger.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] xapian question
2009-07-28 15:57 ` Rich Lane
@ 2009-07-28 19:05 ` William Morgan
2009-08-01 6:28 ` Rich Lane
0 siblings, 1 reply; 5+ messages in thread
From: William Morgan @ 2009-07-28 19:05 UTC (permalink / raw)
Reformatted excerpts from Rich Lane's message of 2009-07-28:
> Xapian actually provides add_term and remove_term for documents.
Excellent.
> I'd definitely like to use these for label updates, but we need a way
> to tell if only the labels have changed in sync_message.
I've been running into this same issue with my sup-server experiments,
so I think we should split the API into, say, three separate calls:
something like add_new_message, update_labels, and update_body. (AFAIK
the only client of update_body is in some of the draft editing stuff.)
WDYT?
> Or, we update the index in Message#add_label/etc and get rid of the
> need to save buffers. That might not be an option for the Ferret
> index, though.
I think that would actually be fine for Ferret, and it's a direction
that's often been discussed. (Especially now that we have undo.) If we
do the above, we can certainly do this as a later step.
> We don't store the body in entries.db, just enough info for
> thread-index-mode. It's only about 800 bytes/message for me, but I
> don't have snippets enabled so yours would be larger.
On second glance, it's a little smaller than I remembered. For my sample
212m mbox, it's about 20m with snippets enabled.
The total index size under Xapian (the xapian/ dir and all gdbm files)
is larger than the original mbox file, which seems a little insane. But
hey, disk space is cheap.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] xapian question
2009-07-28 19:05 ` William Morgan
@ 2009-08-01 6:28 ` Rich Lane
2009-08-03 18:03 ` William Morgan
0 siblings, 1 reply; 5+ messages in thread
From: Rich Lane @ 2009-08-01 6:28 UTC (permalink / raw)
I tried out using add_term/remove_term for immediate label changes. It's
significantly faster than sync_message, but it still makes the interface
feel laggy. There's known room for improvement in Xapian's
replace_document. However, we'll still have a lot of latency when we
start using remote sup-servers, so I don't think it's a good idea to do
these index operations synchronously with the UI.
We could queue up index writes and execute them in a background thread.
We'd want label additions to show up immediately in a search, though.
This is easy to do for inbox-mode and label-view-mode, which covers most
of my daily usage. If/when we support multiple clients connecting to a
sup-server, we'll need a way to notify them that someone else modified a
message. We can implement a simple version of this now that notifies
search-results-mode after the write completes.
If we're getting rid of buffer saving, it'd probably be easiest to use a
weak-ref table so we keep at most 1 copy of each message in memory -
this would make updating messages across buffers simpler.
How is sup-server development going?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] xapian question
2009-08-01 6:28 ` Rich Lane
@ 2009-08-03 18:03 ` William Morgan
0 siblings, 0 replies; 5+ messages in thread
From: William Morgan @ 2009-08-03 18:03 UTC (permalink / raw)
Reformatted excerpts from Rich Lane's message of 2009-07-31:
> I tried out using add_term/remove_term for immediate label changes.
> It's significantly faster than sync_message,
Excellent.
> but it still makes the interface feel laggy. There's known room for
> improvement in Xapian's replace_document. However, we'll still have a
> lot of latency when we start using remote sup-servers, so I don't
> think it's a good idea to do these index operations synchronously with
> the UI.
I agree, synchronous is not an option.
> We could queue up index writes and execute them in a background
> thread. We'd want label additions to show up immediately in a search,
> though. This is easy to do for inbox-mode and label-view-mode, which
> covers most of my daily usage.
I'm fine with queuing up index writes and letting the user continue
while they take effect in the background. I'm also fine with the easier
option of just blocking during a search until the writes are complete.
> If/when we support multiple clients connecting to a sup-server, we'll
> need a way to notify them that someone else modified a message.
I think this is more of a nice-to-have than a necessity, but it would be
nice to have, even if it was a "we've detected a change somewhere on the
internet; reload? (y/n)"-kinda thing.
> How is sup-server development going?
Well. I have a simple version that stores "items" to files on disk, and
uses Ferret to provide the search semantics. It's modular enough that
upgrading to Xapian shouldn't be as painful as it was with Sup. There
are even unit tests that enforce the semantics of the modules. Go me.
I'm going to make a couple internal API changes in Sup and then try
throwing the code together.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-08-03 18:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-27 17:45 [sup-talk] xapian question William Morgan
2009-07-28 15:57 ` Rich Lane
2009-07-28 19:05 ` William Morgan
2009-08-01 6:28 ` Rich Lane
2009-08-03 18:03 ` William Morgan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox