Archive of RubyForge sup-devel mailing list
 help / color / mirror / Atom feed
From: William Morgan <wmorgan-sup@masanjin.net>
To: sup-devel <sup-devel@rubyforge.org>
Subject: Re: [sup-devel] [PATCH] switch default index to Xapian
Date: Sun, 03 Jan 2010 07:14:57 -0800	[thread overview]
Message-ID: <1262530826-sup-7476@masanjin.net> (raw)
In-Reply-To: <1262403040-24105-1-git-send-email-rlane@club.cc.cmu.edu>

Reformatted excerpts from Rich Lane's message of 2010-01-01:
> Previous versions didn't add an :index entry in config.yaml, so
> preserve compatibility by using Ferret if no index is specified and
> the ferret directory exists.

I have done something a little more extensive in the branch
ferret-deprecation, merged into next, so I'm going to drop this patch,
unless you think I missed something.

The current behavior is:
1. If a Xapian index exists, use Xapian
2. Otherwise, if a Ferret index exists, use Ferret
3. Otherwise (new index), use Xapian.

The choice is overrideable by the environment variable (which I'd like
to remove at some point), the config option, and a commandline flag
--index added to most things in bin/.

> Names are stemmed and otherwise munged for convenient searching by
> Xapian::TermGenerator, while email addresses are stored verbatim.
> Xapian::QueryParser needs to do the same alterations to search terms, so the
> parser uses separate from_{name,email} fields. This is not user-friendly but
> could be worked around by having parse_query insert an OR over both fields
> where it sees a from: prefix (same for to).

I'm fine with this solution. At some point (not necessarily for 0.10)
I'd also like to add more email address munging so that the address
bob@foo.com is matched by bob, foo, foo.com, and bob@foo.com, so maybe
this is an analogous case.

> A more pernicious issue is that QueryParser defaults to AND if there
> isn't an explicit operator (which is what we want), but if there are
> multiple boolean (label/email) terms over the same field it will OR
> them. So, "label:sup label:patch" will result in the union instead of
> the intersection. Assuming we don't want to write our own query
> parser, this needs to be made configurable in Xapian. I took a stab at
> it a few months ago but didn't get anywhere.

Ok. Unfortunate, but not a dealbreaker by any means, especially if it's
restricted to emails and labels.

> There's also the issue of long delays when flushing the index to disk
> on exit.  One option is to keep the delay and log an info message
> saying what's going on.  A second option is to set the
> XAPIAN_FLUSH_THRESHOLD environment variable to something low in
> bin/sup, which will limit the final delay but potentially cause short
> delays during normal use. A third option is to detect when the user
> has been idle for a while and flush the index then.

This is something I definitely would like to see fixed before 0.10, but
I would be happy with the silly but trivial option #1. (I suspect #2/#3
will require some back-and-forth to get just right.)

> We can easily fix the first and third issues before 0.10. Are there
> any others I've forgotten?

There was something with the counts in label-list-mode at some point,
but the whole issue has been swapped out of my head.

Ultimately getting us out of the world of Ferret is worth almost any
amount of pain, so, who cares, and, as always, thank you.
-- 
William <wmorgan-sup@masanjin.net>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel


      reply	other threads:[~2010-01-03 15:15 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-02  3:30 Rich Lane
2010-01-03 15:14 ` William Morgan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1262530826-sup-7476@masanjin.net \
    --to=wmorgan-sup@masanjin.net \
    --cc=sup-devel@rubyforge.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox