* [sup-devel] [PATCH] switch default index to Xapian
@ 2010-01-02 3:30 Rich Lane
2010-01-03 15:14 ` William Morgan
0 siblings, 1 reply; 2+ messages in thread
From: Rich Lane @ 2010-01-02 3:30 UTC (permalink / raw)
To: sup-devel
Previous versions didn't add an :index entry in config.yaml, so preserve
compatibility by using Ferret if no index is specified and the ferret directory
exists.
---
This patch is meant for 0.10.
AFAIK the xapian index has feature-parity with ferret. There are a couple of
issues remaining with queries:
Names are stemmed and otherwise munged for convenient searching by
Xapian::TermGenerator, while email addresses are stored verbatim.
Xapian::QueryParser needs to do the same alterations to search terms, so the
parser uses separate from_{name,email} fields. This is not user-friendly but
could be worked around by having parse_query insert an OR over both fields
where it sees a from: prefix (same for to).
A more pernicious issue is that QueryParser defaults to AND if there isn't an
explicit operator (which is what we want), but if there are multiple boolean
(label/email) terms over the same field it will OR them. So, "label:sup
label:patch" will result in the union instead of the intersection. Assuming we
don't want to write our own query parser, this needs to be made configurable in
Xapian. I took a stab at it a few months ago but didn't get anywhere.
There's also the issue of long delays when flushing the index to disk on exit.
One option is to keep the delay and log an info message saying what's going on.
A second option is to set the XAPIAN_FLUSH_THRESHOLD environment variable to
something low in bin/sup, which will limit the final delay but potentially
cause short delays during normal use. A third option is to detect when the user
has been idle for a while and flush the index then.
We can easily fix the first and third issues before 0.10. Are there any others
I've forgotten?
lib/sup.rb | 5 +++--
lib/sup/index.rb | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/lib/sup.rb b/lib/sup.rb
index 144f5e3..fa19de2 100644
--- a/lib/sup.rb
+++ b/lib/sup.rb
@@ -54,7 +54,7 @@ module Redwood
YAML_DOMAIN = "masanjin.net"
YAML_DATE = "2006-10-01"
- DEFAULT_INDEX = 'ferret'
+ DEFAULT_INDEX = 'xapian'
## record exceptions thrown in threads nicely
@exceptions = []
@@ -229,7 +229,8 @@ else
:confirm_top_posting => true,
:discard_snippets_from_encrypted_messages => false,
:default_attachment_save_dir => "",
- :sent_source => "sup://sent"
+ :sent_source => "sup://sent",
+ :index => Redwood::DEFAULT_INDEX,
}
begin
FileUtils.mkdir_p Redwood::BASE_DIR
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 87d8d52..cc78292 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -174,7 +174,7 @@ class BaseIndex
end
end
-index_name = ENV['SUP_INDEX'] || $config[:index] || DEFAULT_INDEX
+index_name = ENV['SUP_INDEX'] || $config[:index] || (File.exists?(File.join(BASE_DIR, 'ferret')) ? 'ferret' : DEFAULT_INDEX)
case index_name
when "xapian"; require "sup/xapian_index"
when "ferret"; require "sup/ferret_index"
--
1.6.3.3
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [sup-devel] [PATCH] switch default index to Xapian
2010-01-02 3:30 [sup-devel] [PATCH] switch default index to Xapian Rich Lane
@ 2010-01-03 15:14 ` William Morgan
0 siblings, 0 replies; 2+ messages in thread
From: William Morgan @ 2010-01-03 15:14 UTC (permalink / raw)
To: sup-devel
Reformatted excerpts from Rich Lane's message of 2010-01-01:
> Previous versions didn't add an :index entry in config.yaml, so
> preserve compatibility by using Ferret if no index is specified and
> the ferret directory exists.
I have done something a little more extensive in the branch
ferret-deprecation, merged into next, so I'm going to drop this patch,
unless you think I missed something.
The current behavior is:
1. If a Xapian index exists, use Xapian
2. Otherwise, if a Ferret index exists, use Ferret
3. Otherwise (new index), use Xapian.
The choice is overrideable by the environment variable (which I'd like
to remove at some point), the config option, and a commandline flag
--index added to most things in bin/.
> Names are stemmed and otherwise munged for convenient searching by
> Xapian::TermGenerator, while email addresses are stored verbatim.
> Xapian::QueryParser needs to do the same alterations to search terms, so the
> parser uses separate from_{name,email} fields. This is not user-friendly but
> could be worked around by having parse_query insert an OR over both fields
> where it sees a from: prefix (same for to).
I'm fine with this solution. At some point (not necessarily for 0.10)
I'd also like to add more email address munging so that the address
bob@foo.com is matched by bob, foo, foo.com, and bob@foo.com, so maybe
this is an analogous case.
> A more pernicious issue is that QueryParser defaults to AND if there
> isn't an explicit operator (which is what we want), but if there are
> multiple boolean (label/email) terms over the same field it will OR
> them. So, "label:sup label:patch" will result in the union instead of
> the intersection. Assuming we don't want to write our own query
> parser, this needs to be made configurable in Xapian. I took a stab at
> it a few months ago but didn't get anywhere.
Ok. Unfortunate, but not a dealbreaker by any means, especially if it's
restricted to emails and labels.
> There's also the issue of long delays when flushing the index to disk
> on exit. One option is to keep the delay and log an info message
> saying what's going on. A second option is to set the
> XAPIAN_FLUSH_THRESHOLD environment variable to something low in
> bin/sup, which will limit the final delay but potentially cause short
> delays during normal use. A third option is to detect when the user
> has been idle for a while and flush the index then.
This is something I definitely would like to see fixed before 0.10, but
I would be happy with the silly but trivial option #1. (I suspect #2/#3
will require some back-and-forth to get just right.)
> We can easily fix the first and third issues before 0.10. Are there
> any others I've forgotten?
There was something with the counts in label-list-mode at some point,
but the whole issue has been swapped out of my head.
Ultimately getting us out of the world of Ferret is worth almost any
amount of pain, so, who cares, and, as always, thank you.
--
William <wmorgan-sup@masanjin.net>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-01-03 15:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-02 3:30 [sup-devel] [PATCH] switch default index to Xapian Rich Lane
2010-01-03 15:14 ` William Morgan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox