Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: wmorgan-sup@masanjin.net (William Morgan)
Subject: [sup-talk] Amazon.com messages can't be added to index
Date: Sun, 24 Feb 2008 21:10:57 -0800	[thread overview]
Message-ID: <1203915874-sup-5504@south> (raw)
In-Reply-To: <2cb10c440802220825l1f22fd07s926db4d1e17b4f81@mail.gmail.com>

Reformatted excerpts from Luis Villa's message of 2008-02-22:
> /usr/lib/ruby/gems/1.8/gems/sup-0.4/lib/sup/index.rb:200:in `sync_message': just added message "!~!UENERkVCMDkAAQACAPYAAAAAAAAAOKG7EAXlEBqhuwgAKypWwgAAbXNwc3QuZGxsAAAAAABOSVRB+b+4AQCqADfZbgAAAABDADoAXABEAG8AYwB1AG0AZQBuAHQAcwAgAGEAbgBkACAAUwBlAHQAdABpAG4AZwBzAFwAawBiAGUAbgB0AG8AbgBcAEwAbwBjAGEAbAAgAFMAZQB0AHQAaQBuAGcAcwBcAEEAcABwAGwAaQBjAGEAdABpAG8AbgAgAEQAYQB0AGEAXABNAGkAYwByAG8AcwBvAGYAdABcAE8AdQB0AGwAbwBvAGsAXABPAHUAdABsAG8AbwBrAC4AcABzAHQAAAAYAAAAAAAAALLH/vR9UMVCgMck3LV+0wHCgAAAGAAAAAAAAACyx/70fVDFQoDHJNy1ftMBhLcgAAAAAAAQAAAAqTDfdQ6dIEawbQUxhNxqVz4AAABSRTogQnVnemlsbGE6IEhhcyBhbnlvbmUgc3VjY2Vzc2Z1bGx5IGNyZWF0ZWQgU3ViLUNvbXBvbmVudHM/AA==@amd.com" but couldn't find it in a search (RuntimeError)

Sigh. Why would anyone generate a message id like that?

There were two problems causing your error. I've fixed them both in git
next. You can probably apply the attached patches to your 0.4 release if
you don't want to use git just yet.

The first problem was that marking the message_id field as non-tokenized
in Ferret just solves all sorts of tokenization problems. So that's in.

The second problem is a Ferret bug, where apparently TermQuery values of
more than 255 characters never match anything. The current workaround
just lops off anything after the 255th character. And that may very well
screw things up if falsely uniquefies things.

The right long-term answer is probably to take the hex SHA1 of every
message id and just use that instead of the original value. Then all of
these issues will be solved. That will require an index rebuild for
everyone, so I'm going to hold off on that for now.

-- 
William <wmorgan-sup at masanjin.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-don-t-tokenize-message_id-field-in-index.patch
Type: application/octet-stream
Size: 945 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080224/94955be9/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-only-use-the-first-255-characters-of-a-message-id-f.patch
Type: application/octet-stream
Size: 1046 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080224/94955be9/attachment-0001.obj 


  reply	other threads:[~2008-02-25  5:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-22 15:46 Luis Villa
2008-02-22 16:25 ` Luis Villa
2008-02-25  5:10   ` William Morgan [this message]
2008-02-25 17:08     ` Christopher Warrington
2008-02-25 17:13       ` William Morgan
  -- strict thread matches above, loose matches on Subject: below --
2007-10-14  3:32 jenny w
2007-10-14  5:20 ` Kevin Mark
2007-10-14  9:04   ` jenny w
2007-10-14 21:23     ` Christopher Warrington
2007-10-14 22:32       ` jenny w
2007-10-28  3:06         ` William Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1203915874-sup-5504@south \
    --to=wmorgan-sup@masanjin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox