* [sup-talk] xapian merged into next
@ 2009-07-27 16:13 William Morgan
2009-07-27 16:16 ` Guillaume Quintard
0 siblings, 1 reply; 18+ messages in thread
From: William Morgan @ 2009-07-27 16:13 UTC (permalink / raw)
I've merged the xapian branch into next. Users of next need not worry;
xapian only enabled when you set the environment variable
SUP_INDEX=xapian. (And you'll have to regenerate your index; see my
message of a few weeks ago for how.)
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:13 [sup-talk] xapian merged into next William Morgan
@ 2009-07-27 16:16 ` Guillaume Quintard
2009-07-27 16:27 ` William Morgan
0 siblings, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-07-27 16:16 UTC (permalink / raw)
On Mon, Jul 27, 2009 at 6:13 PM, William Morgan<wmorgan-sup at masanjin.net> wrote:
> I've merged the xapian branch into next. Users of next need not worry;
> xapian only enabled when you set the environment variable
> SUP_INDEX=xapian. (And you'll have to regenerate your index; see my
> message of a few weeks ago for how.)
The big question is: is it interesting, as a user, to switch? :-)
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:16 ` Guillaume Quintard
@ 2009-07-27 16:27 ` William Morgan
2009-07-27 16:31 ` Guillaume Quintard
2009-07-28 0:33 ` Richard Heycock
0 siblings, 2 replies; 18+ messages in thread
From: William Morgan @ 2009-07-27 16:27 UTC (permalink / raw)
Reformatted excerpts from Guillaume Quintard's message of 2009-07-27:
> The big question is: is it interesting, as a user, to switch? :-)
Yes. It's noticeably faster than Ferret, especially for loading large
threads in thread-index-mode. (Which isn't Xapian per se, but other
improvements Rich has made).
It's also much larger on disk, though there might be a way to trim that
down.
At some point I want to deprecate Ferret, since it's unmaintained, so
you'll be forced to switch. No timeline on that though.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:27 ` William Morgan
@ 2009-07-27 16:31 ` Guillaume Quintard
2009-07-27 16:44 ` William Morgan
2009-07-28 0:33 ` Richard Heycock
1 sibling, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-07-27 16:31 UTC (permalink / raw)
On Mon, Jul 27, 2009 at 6:27 PM, William Morgan<wmorgan-sup at masanjin.net> wrote:
> Yes. It's noticeably faster than Ferret, especially for loading large
> threads in thread-index-mode. (Which isn't Xapian per se, but other
> improvements Rich has made).
Yummy!
> It's also much larger on disk, though there might be a way to trim that
> down.
Less yummy!
So, we just export SUP_INDEX=xapian and that's it? We start with an
empty sup and we just have to reimport the mbox/maildir/whatever?
Means losing the red states and tags, I guess.
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:31 ` Guillaume Quintard
@ 2009-07-27 16:44 ` William Morgan
2009-07-27 16:47 ` Guillaume Quintard
2009-07-27 20:04 ` [sup-talk] " Guillaume Quintard
0 siblings, 2 replies; 18+ messages in thread
From: William Morgan @ 2009-07-27 16:44 UTC (permalink / raw)
Reformatted excerpts from Guillaume Quintard's message of 2009-07-27:
> So, we just export SUP_INDEX=xapian and that's it? We start with an
> empty sup and we just have to reimport the mbox/maildir/whatever?
> Means losing the red states and tags, I guess.
Current instructions:
1. install the ruby xapian library and the ruby gdbm library, if you
don't have them. These are packaged by your distro, and are not gems.
2. git checkout next
3. git pull
4. cp ~/.sup/sources.yaml /tmp # just in case
5. ruby -Ilib bin/sup-dump > dumpfile
6. SUP_INDEX=xapian ruby -Ilib bin/sup-sync --all --all-sources --restore dumpfile
7. SUP_INDEX=xapian ruby -Ilib bin/sup -o
8. Oooh, fast.
This should not disturb your Ferret index, so you can switch back and forth
between the two. (Message state, of course, is not shared.) However, adding new
messages to one index will prevent it from being automatically added to the
other, so I recommend running in Xapian mode with -o and not pressing 'P'.
Unless, of of course, you're ready to switch permanently, in which case rm -rf
~/.sup/ferret. :)
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:44 ` William Morgan
@ 2009-07-27 16:47 ` Guillaume Quintard
2009-07-27 16:50 ` William Morgan
2009-07-27 20:04 ` [sup-talk] " Guillaume Quintard
1 sibling, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-07-27 16:47 UTC (permalink / raw)
On Mon, Jul 27, 2009 at 6:44 PM, William Morgan<wmorgan-sup at masanjin.net> wrote:
> Unless, of of course, you're ready to switch permanently, in which case rm -rf
> ~/.sup/ferret. :)
There's no reason not to, right?
/me is following blinding the given instructions :-)
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:47 ` Guillaume Quintard
@ 2009-07-27 16:50 ` William Morgan
2009-07-27 17:09 ` Guillaume Quintard
0 siblings, 1 reply; 18+ messages in thread
From: William Morgan @ 2009-07-27 16:50 UTC (permalink / raw)
Reformatted excerpts from Guillaume Quintard's message of 2009-07-27:
> There's no reason not to, right?
Well, the code isn't quite as well tested...
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:50 ` William Morgan
@ 2009-07-27 17:09 ` Guillaume Quintard
2009-07-27 17:34 ` William Morgan
0 siblings, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-07-27 17:09 UTC (permalink / raw)
On Mon, Jul 27, 2009 at 6:50 PM, William Morgan<wmorgan-sup at masanjin.net> wrote:
> Well, the code isn't quite as well tested...
Someone has to do it, plus I still have my mbox archive...
...and I was getting tired of this mostly bugless mail experience.
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 17:09 ` Guillaume Quintard
@ 2009-07-27 17:34 ` William Morgan
[not found] ` <1e5fdab70908010934l30373447r4a405c5ca0e406f9@mail.gmail.com>
0 siblings, 1 reply; 18+ messages in thread
From: William Morgan @ 2009-07-27 17:34 UTC (permalink / raw)
Reformatted excerpts from Guillaume Quintard's message of 2009-07-27:
> Someone has to do it, plus I still have my mbox archive...
> ...and I was getting tired of this mostly bugless mail experience.
Ok, you're the official guinea pig then. :)
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:44 ` William Morgan
2009-07-27 16:47 ` Guillaume Quintard
@ 2009-07-27 20:04 ` Guillaume Quintard
2009-07-28 3:33 ` Rich Lane
1 sibling, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-07-27 20:04 UTC (permalink / raw)
On Mon, Jul 27, 2009 at 6:44 PM, William Morgan<wmorgan-sup at masanjin.net> wrote:
all went well until:
> 5. ruby -Ilib bin/sup-dump > dumpfile
-> produces a empty file (not sure it's normal)
> 6. SUP_INDEX=xapian ruby -Ilib bin/sup-sync --all --all-sources --restore dumpfile
-> error
[Mon Jul 27 22:01:35 +0200 2009] using character set encoding "UTF-8"
./lib/sup/xapian_index.rb:17:in `at': bignum too big to convert into
`long' (RangeError)
from ./lib/sup/xapian_index.rb:17
from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from ./lib/sup/index.rb:217
from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from ./lib/sup.rb:269
from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from bin/sup-sync:6
l17 of xapian_index.rb is MAX_DATE = Time.at(2**31)
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 16:27 ` William Morgan
2009-07-27 16:31 ` Guillaume Quintard
@ 2009-07-28 0:33 ` Richard Heycock
1 sibling, 0 replies; 18+ messages in thread
From: Richard Heycock @ 2009-07-28 0:33 UTC (permalink / raw)
Excerpts from William Morgan's message of Tue Jul 28 02:27:42 +1000 2009:
> Reformatted excerpts from Guillaume Quintard's message of 2009-07-27:
> > The big question is: is it interesting, as a user, to switch? :-)
>
> Yes. It's noticeably faster than Ferret, especially for loading large
> threads in thread-index-mode. (Which isn't Xapian per se, but other
> improvements Rich has made).
>
> It's also much larger on disk, though there might be a way to trim that
> down.
>
> At some point I want to deprecate Ferret, since it's unmaintained, so
> you'll be forced to switch. No timeline on that though.
Just to add to Williams answer not only is it faster it's also
*significantly* more robust. I'm running Debian unstable which, at the
moment is really living up to it's name, consequently my machine is
dying a lot more times than it should and I haven't had to rebuild the
index once. Compare that to ferret where I pretty much has to rebuild
the index every time; I even wrote myself a one line script to do it.
William there is work being done on the next xapian "engine" which aims
to reduce the disc size.
rgh
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-27 20:04 ` [sup-talk] " Guillaume Quintard
@ 2009-07-28 3:33 ` Rich Lane
2009-07-28 15:13 ` William Morgan
0 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2009-07-28 3:33 UTC (permalink / raw)
Excerpts from Guillaume Quintard's message of Mon Jul 27 16:04:58 -0400 2009:
> > 5. ruby -Ilib bin/sup-dump > dumpfile
> -> produces a empty file (not sure it's normal)
>
> ./lib/sup/xapian_index.rb:17:in `at': bignum too big to convert into
Oops, breaking sup-dump would make switching to Xapian a little
difficult. I've posted patches for both these issues. We really should
have more tests to catch this sort of thing...
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] xapian merged into next
2009-07-28 3:33 ` Rich Lane
@ 2009-07-28 15:13 ` William Morgan
0 siblings, 0 replies; 18+ messages in thread
From: William Morgan @ 2009-07-28 15:13 UTC (permalink / raw)
Reformatted excerpts from Rich Lane's message of 2009-07-27:
> We really should have more tests to catch this sort of thing...
Agreed. Mostly my fault, I'm afraid.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] Fwd: xapian merged into next
[not found] ` <1e5fdab70908010934l30373447r4a405c5ca0e406f9@mail.gmail.com>
@ 2009-08-01 17:44 ` Guillaume Quintard
2009-08-01 18:14 ` Rich Lane
0 siblings, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-08-01 17:44 UTC (permalink / raw)
I get this when I run "SUP_INDEX=xapian ruby -Ilib bin/sup-sync --all
--all-sources --restore dumpfile" (only had the time to do it today):
./lib/sup/xapian_index.rb:435:in `replace_document':
InvalidArgumentError: Term too long (> 245):
E"name.surname at enst-bretagne.fr,
my_name.my_surname at enst-bretagne.fr, name.surname at enst-bretagne.fr,
name.surname at enst-bretagne.fr,
name.surname at telecom-bretagne.eu,
name.surname at telecom-bretagne.eu,
name.surname at telecom-bretagne.eu, name.surname"@telecom-bretagne.eu
(ArgumentError)
? ? ? ?from ./lib/sup/xapian_index.rb:435:in `index_message'
? ? ? ?from ./lib/sup/xapian_index.rb:117:in `sync_message'
? ? ? ?from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'
? ? ? ?from ./lib/sup/xapian_index.rb:330:in `synchronize'
? ? ? ?from ./lib/sup/xapian_index.rb:116:in `sync_message'
? ? ? ?from ./lib/sup/util.rb:519:in `send'
? ? ? ?from ./lib/sup/util.rb:519:in `method_missing'
? ? ? ?from ./lib/sup/poll.rb:157:in `add_messages_from'
? ? ? ?from ./lib/sup/source.rb:100:in `each'
? ? ? ?from ./lib/sup/util.rb:558:in `send'
? ? ? ?from ./lib/sup/util.rb:558:in `__pass'
? ? ? ?from ./lib/sup/util.rb:545:in `method_missing'
? ? ? ?from ./lib/sup/poll.rb:141:in `add_messages_from'
? ? ? ?from ./lib/sup/util.rb:519:in `send'
? ? ? ?from ./lib/sup/util.rb:519:in `method_missing'
? ? ? ?from bin/sup-sync:140
? ? ? ?from bin/sup-sync:135:in `each'
? ? ? ?from bin/sup-sync:135
Cheers!
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] Fwd: xapian merged into next
2009-08-01 17:44 ` [sup-talk] Fwd: " Guillaume Quintard
@ 2009-08-01 18:14 ` Rich Lane
2009-08-01 18:31 ` Guillaume Quintard
0 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2009-08-01 18:14 UTC (permalink / raw)
Excerpts from Guillaume Quintard's message of Sat Aug 01 13:44:50 -0400 2009:
> E"name.surname at enst-bretagne.fr,
> my_name.my_surname at enst-bretagne.fr, name.surname at enst-bretagne.fr,
> name.surname at enst-bretagne.fr,
> name.surname at telecom-bretagne.eu,
> name.surname at telecom-bretagne.eu,
> name.surname at telecom-bretagne.eu, name.surname"@telecom-bretagne.eu
That's a very strange email address :). Could you track down the message
causing this and post the headers? It looks like it's getting parsed
incorrectly.
I think we'd be safe not adding terms for email addresses longer than
244 characters on the assumption that the user isn't going to want to
search for them.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] Fwd: xapian merged into next
2009-08-01 18:14 ` Rich Lane
@ 2009-08-01 18:31 ` Guillaume Quintard
2009-08-01 23:03 ` Rich Lane
0 siblings, 1 reply; 18+ messages in thread
From: Guillaume Quintard @ 2009-08-01 18:31 UTC (permalink / raw)
Excerpts from Rich Lane's message of Sat Aug 01 20:14:34 +0200 2009:
> I think we'd be safe not adding terms for email addresses longer than
> 244 characters on the assumption that the user isn't going to want to
> search for them.
http://files.getdropbox.com/u/155904/grepped
I did a simple grep, tell me if it's not enough (I'd rather not dive
into the humongous mbox file).
The mails come from the mailing-list admin tool (sympa), encoding
problem it looks, the mangled part is "Propri?taires de liste" (list
owners in french)
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] Fwd: xapian merged into next
2009-08-01 18:31 ` Guillaume Quintard
@ 2009-08-01 23:03 ` Rich Lane
2009-08-01 23:35 ` Guillaume Quintard
0 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2009-08-01 23:03 UTC (permalink / raw)
Excerpts from Guillaume Quintard's message of Sat Aug 01 14:31:55 -0400 2009:
> Excerpts from Rich Lane's message of Sat Aug 01 20:14:34 +0200 2009:
> > I think we'd be safe not adding terms for email addresses longer than
> > 244 characters on the assumption that the user isn't going to want to
> > search for them.
>
> http://files.getdropbox.com/u/155904/grepped
>
> I did a simple grep, tell me if it's not enough (I'd rather not dive
> into the humongous mbox file).
> The mails come from the mailing-list admin tool (sympa), encoding
> problem it looks, the mangled part is "Propri?taires de liste" (list
> owners in french)
>
I posted a patch that should keep Xapian from throwing an exception
with long email addresses. I'm not any sort of encoding expert, but I'd
guess that sup isn't responsible for the mangling.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [sup-talk] Fwd: xapian merged into next
2009-08-01 23:03 ` Rich Lane
@ 2009-08-01 23:35 ` Guillaume Quintard
0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Quintard @ 2009-08-01 23:35 UTC (permalink / raw)
Excerpts from Rich Lane's message of Sun Aug 02 01:03:06 +0200 2009:
> with long email addresses. I'm not any sort of encoding expert, but I'd
> guess that sup isn't responsible for the mangling.
it isn't, the mangling comes from the mbox file, yet sup was able to
import it at some point. I guesse the quotes don't really help.
--
Guillaume
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-08-01 23:35 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-27 16:13 [sup-talk] xapian merged into next William Morgan
2009-07-27 16:16 ` Guillaume Quintard
2009-07-27 16:27 ` William Morgan
2009-07-27 16:31 ` Guillaume Quintard
2009-07-27 16:44 ` William Morgan
2009-07-27 16:47 ` Guillaume Quintard
2009-07-27 16:50 ` William Morgan
2009-07-27 17:09 ` Guillaume Quintard
2009-07-27 17:34 ` William Morgan
[not found] ` <1e5fdab70908010934l30373447r4a405c5ca0e406f9@mail.gmail.com>
2009-08-01 17:44 ` [sup-talk] Fwd: " Guillaume Quintard
2009-08-01 18:14 ` Rich Lane
2009-08-01 18:31 ` Guillaume Quintard
2009-08-01 23:03 ` Rich Lane
2009-08-01 23:35 ` Guillaume Quintard
2009-07-27 20:04 ` [sup-talk] " Guillaume Quintard
2009-07-28 3:33 ` Rich Lane
2009-07-28 15:13 ` William Morgan
2009-07-28 0:33 ` Richard Heycock
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox