Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: Tero Tilus <tero@tilus.net>
To: sup-talk@rubyforge.org
Subject: [sup-talk] Xapian: Term too long
Date: Tue, 13 Oct 2009 01:34:49 +0300	[thread overview]
Message-ID: <20091012223449.GB31940@tilus.net> (raw)

sup-sync blows up like this

/home/terotil/src/sup/lib/sup/xapian_index.rb:446:in `replace_document': InvalidArgumentError: Term too long (> 245): Lfwd: =?iso-8859-1?q?tekij=e4n_oikeudet=5d?= (ArgumentError)
x-enigmail-version: 0.92.0.0
content-type: multipart/mixed;
 boundary="------------010606010007070802040301"
x-virus-scanned: amavisd-new at cc.jyu.fi
x-spam-status: no, hits=-2.373 required=5 tests=[awl=0.226, bayes_00=-2.599
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:446:in `sync_message'
        from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:363:in `synchronize'
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:440:in `sync_message'
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:92:in `add_message'
        from /home/terotil/src/sup/bin/sup-sync:211
	...

Relevant part of the problematic mail looks like this

User-Agent: Debian Thunderbird 1.0.6 (X11/20050802)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: mutikainen@iki.fi
Subject: [Fwd: =?ISO-8859-1?Q?tekij=E4n_oikeudet=5D?=
X-Enigmail-Version: 0.92.0.0
Content-Type: multipart/mixed;
 boundary="------------010606010007070802040301"
X-Virus-Scanned: amavisd-new at cc.jyu.fi
X-Spam-Status: No, hits=-2.373 required=5 tests=[AWL=0.226, BAYES_00=-2.599]
X-Spam-Level: 
X-Sorted: Whitelist
Content-Length: 11892

This is how I solved it for me, for now

diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index ad45b0e..d3b3e25 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -443,7 +443,11 @@ EOS
         warn "docid underflow, dropping #{m.id.inspect}"
         return
       end
-      @xapian.replace_document docid, doc
+      begin
+        @xapian.replace_document docid, doc
+      rescue StandardError => err
+        warn "Failed to add message #{m.id.inspect} to Xapian index: #{err}"
+      end
     end
 
     m.labels.each { |l| LabelManager << l }

Looks like lib/sup/xapian_index.rb tries to override
Xapian::Document#add_term with a version which is wired to ditch too
long terms.  Only that you can't override methods just by including a
module.  Methods of the including class override methods in included
module.

terotil@sotka:~$ irb
> class Foo; def bar; :bar; end; end
=> nil
> module Baz; def bar; :baz; end; end
=> nil
> class Foo; include Baz; end
=> Foo
> Foo.new.bar
=> :bar
> Foo.ancestors
=> [Foo, Baz, Object, Kernel]  # Foo before Baz, methods in Foo take priority

It is still Foo#bar being called, not Baz#bar.  You need to open up
Xapian::Document and then do alias method chaining to override
methods.  Or you could do tricks like
http://coderrr.wordpress.com/2008/10/29/secure-alias-method-chaining/

-- 
Tero Tilus ## 050 3635 235 ## http://tero.tilus.net/
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk


             reply	other threads:[~2009-10-12 22:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-12 22:34 Tero Tilus [this message]
2009-10-15 12:59 ` William Morgan
2009-10-20  5:34 ` [sup-talk] [PATCH] xapian: replace DocumentMethods module with plain monkeypatching Rich Lane
2009-10-20  6:13   ` Rich Lane
2009-10-20  6:14 ` Rich Lane
2009-11-02 19:28   ` William Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091012223449.GB31940@tilus.net \
    --to=tero@tilus.net \
    --cc=sup-talk@rubyforge.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox