From mboxrd@z Thu Jan 1 00:00:00 1970 From: wmorgan-sup@masanjin.net (William Morgan) Date: Wed, 24 Jun 2009 09:30:39 -0700 Subject: [sup-talk] [PATCH 0/18] Xapian-based index In-Reply-To: <1245531017-9907-1-git-send-email-rlane@club.cc.cmu.edu> References: <1245531017-9907-1-git-send-email-rlane@club.cc.cmu.edu> Message-ID: <1245854803-sup-4481@entry> Hi Rich, Reformatted excerpts from Rich Lane's message of 2009-06-20: > This patch series refactors the Index class to remove Ferret-isms and > support multiple index implementations. The included XapianIndex is a > bit faster at indexing messages and significantly faster when > searching because it precomputes thread membership. It also works on > Ruby 1.9.1. This is great. Really, really great. You've refactored a crufty interface that's been growing untamed over the past three years, you've gotten us away from the unmaintained scariness that is Ferret, you've fixed the largest source of interface slowness (thread recomputation), and you've enabled us to move to the beautiful, speedy, encoding-aware world of Ruby 1.9. Thank you for satisfying all of my Sup-related desires in one fell swoop. From my lofty throne, I commend thee. Once the bugs are ironed out, I would like to make this the default index format and eventually deprecate Ferret. In the mean time, I've placed your patches on a branch called xapian. If anyone wants to play with this, here's what you do: 1. install the ruby xapian library and the ruby gdbm library, if you don't have them. These are packaged by your distro, and are not gems. 2. git fetch 3. git checkout -b xapian origin/xapian 4. cp ~/.sup/sources.yaml /tmp # just in case 5. sup-dump > dumpfile 6. SUP_INDEX=xapian sup-sync --all --all-sources --restore dumpfile 7. SUP_INDEX=xapian bin/sup -o 8. Oooh, fast. This should not disturb your Ferret index, so you can switch back and forth between the two. (Message state, of course, is not shared.) However, adding new messages to one index will prevent it from being automatically added to the other, so I recommend running in Xapian mode with -o and not pressing 'P'. > It's missing a couple of features, notably threading by subject. FWIW, I've been thinking about deprecating that particular feature for quite some time. > I'm sure there are many more bugs left, so I'd appreciate any testing > or review you all can provide. sup-sync crashes for me fairly systematically with this error: ./lib/sup/xapian_index.rb:404:in `sortable_serialise': Expected argument 0 of type double, but got Fixnum 51767811298 (TypeError) in SWIG method 'Xapian::sortable_serialise' from ./lib/sup/xapian_index.rb:404:in `index_message' from ./lib/sup/xapian_index.rb:111:in `sync_message' from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize' from ./lib/sup/xapian_index.rb:324:in `synchronize' from ./lib/sup/xapian_index.rb:110:in `sync_message' from ./lib/sup/util.rb:519:in `send' from ./lib/sup/util.rb:519:in `method_missing' from ./lib/sup/poll.rb:157:in `add_messages_from' from ./lib/sup/source.rb:100:in `each' from ./lib/sup/util.rb:558:in `send' from ./lib/sup/util.rb:558:in `__pass' from ./lib/sup/util.rb:545:in `method_missing' from ./lib/sup/poll.rb:141:in `add_messages_from' from ./lib/sup/util.rb:519:in `send' from ./lib/sup/util.rb:519:in `method_missing' from bin/sup-sync:140 from bin/sup-sync:135:in `each' from bin/sup-sync:135 I haven't spent any time tracking it down. Other than that, so far so good. -- William