* [sup-talk] Amazon.com messages can't be added to index
@ 2007-10-14 3:32 jenny w
2007-10-14 5:20 ` Kevin Mark
0 siblings, 1 reply; 11+ messages in thread
From: jenny w @ 2007-10-14 3:32 UTC (permalink / raw)
Hi,
I just installed sup and am quite excited to check it out in action. I
haven't gotten much further than my first sync (27k messages over
IMAPS), but I've noticed something that I didn't in the archives
(relying on Google for searching).
Messages I've received from Amazon.com seem to consistently bring up
an exception on line 157 of 188 of index.rb in the 0.1 release. I saw
in the troubleshooting tips to try the version from svn but I get a
similar exception. So I just changed "raise" to "puts", under the
assumption this means that I just won't be able to search old
Amazon.com messages. Right now I don't know if this is all Amazon.com
messages or just some of them. After the sync is done I'll see if I
can reproduce with just one message and try to narrow down what about
the message is causing the problem.
Anyway, thought I'd mention it in case someone knows about this
problem, or in case it helps track down what the problem is (the
comment above line 188 is 'this hasn't been triggered in a long time.
TODO: decide whether it's still a problem.')
Sample error message:
just added message .AAA-notification-24668,
9630.1162603428 at na-rte-app-5104.iad5.amazon.com but couldn't find it
in a search
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2007-10-14 3:32 [sup-talk] Amazon.com messages can't be added to index jenny w
@ 2007-10-14 5:20 ` Kevin Mark
2007-10-14 9:04 ` jenny w
0 siblings, 1 reply; 11+ messages in thread
From: Kevin Mark @ 2007-10-14 5:20 UTC (permalink / raw)
On Sat, Oct 13, 2007 at 08:32:23PM -0700, jenny w wrote:
> Hi,
>
> I just installed sup and am quite excited to check it out in action. I
> haven't gotten much further than my first sync (27k messages over
> IMAPS), but I've noticed something that I didn't in the archives
> (relying on Google for searching).
>
> Messages I've received from Amazon.com seem to consistently bring up
> an exception on line 157 of 188 of index.rb in the 0.1 release. I saw
> in the troubleshooting tips to try the version from svn but I get a
> similar exception. So I just changed "raise" to "puts", under the
> assumption this means that I just won't be able to search old
> Amazon.com messages. Right now I don't know if this is all Amazon.com
> messages or just some of them. After the sync is done I'll see if I
> can reproduce with just one message and try to narrow down what about
> the message is causing the problem.
>
> Anyway, thought I'd mention it in case someone knows about this
> problem, or in case it helps track down what the problem is (the
> comment above line 188 is 'this hasn't been triggered in a long time.
> TODO: decide whether it's still a problem.')
>
> Sample error message:
>
> just added message .AAA-notification-24668,
> 9630.1162603428 at na-rte-app-5104.iad5.amazon.com but couldn't find it
> in a search
It would be helpful to the list to show a portion of the actual error
message(or all of it if its small) instead of a restatement of it. A
restatement of an error message is rarely adequet to solve an error.
--
| .''`. == Debian GNU/Linux == | my web site: |
| : :' : The Universal |mysite.verizon.net/kevin.mark/|
| `. `' Operating System | go to counter.li.org and |
| `- http://www.debian.org/ | be counted! #238656 |
| my keyserver: subkeys.pgp.net | my NPO: cfsg.org |
|join the new debian-community.org to help Debian! |
|_______ Unless I ask to be CCd, assume I am subscribed _______|
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2007-10-14 5:20 ` Kevin Mark
@ 2007-10-14 9:04 ` jenny w
2007-10-14 21:23 ` Christopher Warrington
0 siblings, 1 reply; 11+ messages in thread
From: jenny w @ 2007-10-14 9:04 UTC (permalink / raw)
On 10/13/07, Kevin Mark <kevin.mark at verizon.net> wrote:
> It would be helpful to the list to show a portion of the actual error
> message(or all of it if its small) instead of a restatement of it. A
> restatement of an error message is rarely adequet to solve an error.
Okay, more info. I ended up running this under ruby-debug to see what
was going on. It seems the problem I'm having is that Amazon.com
sends me messages with message_ids like:
AAA-notification-70925, 4617.1234123412 at na-rte-app-5105.iad5.amazon.com
The space after the comma is what's throwing ferret off.
I don't know if it helps, but I noticed that if I try doing
@index.search("message_id:#{m.id}") ferret will find the doc. This
seems to be because Ferret is turning the string into a Boolean (as
opposed to Term) query.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2007-10-14 9:04 ` jenny w
@ 2007-10-14 21:23 ` Christopher Warrington
2007-10-14 22:32 ` jenny w
0 siblings, 1 reply; 11+ messages in thread
From: Christopher Warrington @ 2007-10-14 21:23 UTC (permalink / raw)
Excerpts from veganjenny's message of Sun Oct 14 04:04:04 -0500 2007:
> Okay, more info. I ended up running this under ruby-debug to see what
> was going on. It seems the problem I'm having is that Amazon.com
> sends me messages with message_ids like:
>
> AAA-notification-70925, 4617.1234123412 at na-rte-app-5105.iad5.amazon.com
>
> The space after the comma is what's throwing ferret off.
I know that this doesn't solve the problem, but that's not a valid
RFC822 MID. The space and comma are not allowed in the part to the left
of the at-sign (unless the part to the left is a quoted string, which it
is not in this case). Section 3.4.1 talks explicitly about this.
--
Christopher Warrington <chrisw at rice.edu>
Jones College
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2007-10-14 21:23 ` Christopher Warrington
@ 2007-10-14 22:32 ` jenny w
2007-10-28 3:06 ` William Morgan
0 siblings, 1 reply; 11+ messages in thread
From: jenny w @ 2007-10-14 22:32 UTC (permalink / raw)
On 10/14/07, Christopher Warrington <chrisw at rice.edu> wrote:
> I know that this doesn't solve the problem, but that's not a valid
> RFC822 MID. The space and comma are not allowed in the part to the left
> of the at-sign (unless the part to the left is a quoted string, which it
> is not in this case). Section 3.4.1 talks explicitly about this.
Yes, that's a problem, and I think they've fixed it in their current
e-mails, but I suspect violations of rfc822 will continue happen.
I think the problem is that the field gets tokenized. I thought using
a PhraseQuery would solve the problem, but that didn't help. Sorry I
can't provide more useful information!
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2007-10-14 22:32 ` jenny w
@ 2007-10-28 3:06 ` William Morgan
0 siblings, 0 replies; 11+ messages in thread
From: William Morgan @ 2007-10-28 3:06 UTC (permalink / raw)
Excerpts from veganjenny's message of Sun Oct 14 15:32:08 -0700 2007:
> I think the problem is that the field gets tokenized.
Precisely so. I think I have finally fixed this problem in SVN (by
stripping all spaces from message ids). Update to r637 and see if it
still happens.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
@ 2008-02-22 15:46 Luis Villa
2008-02-22 16:25 ` Luis Villa
0 siblings, 1 reply; 11+ messages in thread
From: Luis Villa @ 2008-02-22 15:46 UTC (permalink / raw)
From a long ago thread:
http://rubyforge.org/pipermail/sup-talk/2007-October/000326.html
It looks like I've found a similar problem. In importing my mail (with
0.4, haven't tried next yet) I get:
/usr/lib/ruby/gems/1.8/gems/sup-0.4/lib/sup/index.rb:200:in
`sync_message': just added message
"!~!UENERkVCMDkAAQACAPYAAAAAAAAAOKG7EAXlEBqhuwgAKypWwgAAbXNwc3QuZGxsAAAAAABOSVRB+b+4AQCqADfZbgAAAABDADoAXABEAG8AYwB1AG0AZQBuAHQAcwAgAGEAbgBkACAAUwBlAHQAdABpAG4AZwBzAFwAawBiAGUAbgB0AG8AbgBcAEwAbwBjAGEAbAAgAFMAZQB0AHQAaQBuAGcAcwBcAEEAcABwAGwAaQBjAGEAdABpAG8AbgAgAEQAYQB0AGEAXABNAGkAYwByAG8AcwBvAGYAdABcAE8AdQB0AGwAbwBvAGsAXABPAHUAdABsAG8AbwBrAC4AcABzAHQAAAAYAAAAAAAAALLH/vR9UMVCgMck3LV+0wHCgAAAGAAAAAAAAACyx/70fVDFQoDHJNy1ftMBhLcgAAAAAAAQAAAAqTDfdQ6dIEawbQUxhNxqVz4AAABSRTogQnVnemlsbGE6IEhhcyBhbnlvbmUgc3VjY2Vzc2Z1bGx5IGNyZWF0ZWQgU3ViLUNvbXBvbmVudHM/AA==@amd.com"
but couldn't find it in a search (RuntimeError)
Is this still a tokenization problem, or...? Any additional debugging
information I can grab to help debug?
Thanks-
Luis
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2008-02-22 15:46 Luis Villa
@ 2008-02-22 16:25 ` Luis Villa
2008-02-25 5:10 ` William Morgan
0 siblings, 1 reply; 11+ messages in thread
From: Luis Villa @ 2008-02-22 16:25 UTC (permalink / raw)
From a long ago thread:
http://rubyforge.org/pipermail/sup-talk/2007-October/000326.html
It looks like I've found a similar problem. In importing my mail (with
0.4, haven't tried next yet) I get:
/usr/lib/ruby/gems/1.8/gems/sup-0.4/lib/sup/index.rb:200:in
`sync_message': just added message
"!~!UENERkVCMDkAAQACAPYAAAAAAAAAOKG7EAXlEBqhuwgAKypWwgAAbXNwc3QuZGxsAAAAAABOSVRB+b+4AQCqADfZbgAAAABDADoAXABEAG8AYwB1AG0AZQBuAHQAcwAgAGEAbgBkACAAUwBlAHQAdABpAG4AZwBzAFwAawBiAGUAbgB0AG8AbgBcAEwAbwBjAGEAbAAgAFMAZQB0AHQAaQBuAGcAcwBcAEEAcABwAGwAaQBjAGEAdABpAG8AbgAgAEQAYQB0AGEAXABNAGkAYwByAG8AcwBvAGYAdABcAE8AdQB0AGwAbwBvAGsAXABPAHUAdABsAG8AbwBrAC4AcABzAHQAAAAYAAAAAAAAALLH/vR9UMVCgMck3LV+0wHCgAAAGAAAAAAAAACyx/70fVDFQoDHJNy1ftMBhLcgAAAAAAAQAAAAqTDfdQ6dIEawbQUxhNxqVz4AAABSRTogQnVnemlsbGE6IEhhcyBhbnlvbmUgc3VjY2Vzc2Z1bGx5IGNyZWF0ZWQgU3ViLUNvbXBvbmVudHM/AA==@amd.com"
but couldn't find it in a search (RuntimeError)
Is this still a tokenization problem, or...? Any additional debugging
information I can grab to help debug?
Thanks-
Luis
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2008-02-22 16:25 ` Luis Villa
@ 2008-02-25 5:10 ` William Morgan
2008-02-25 17:08 ` Christopher Warrington
0 siblings, 1 reply; 11+ messages in thread
From: William Morgan @ 2008-02-25 5:10 UTC (permalink / raw)
Reformatted excerpts from Luis Villa's message of 2008-02-22:
> /usr/lib/ruby/gems/1.8/gems/sup-0.4/lib/sup/index.rb:200:in `sync_message': just added message "!~!UENERkVCMDkAAQACAPYAAAAAAAAAOKG7EAXlEBqhuwgAKypWwgAAbXNwc3QuZGxsAAAAAABOSVRB+b+4AQCqADfZbgAAAABDADoAXABEAG8AYwB1AG0AZQBuAHQAcwAgAGEAbgBkACAAUwBlAHQAdABpAG4AZwBzAFwAawBiAGUAbgB0AG8AbgBcAEwAbwBjAGEAbAAgAFMAZQB0AHQAaQBuAGcAcwBcAEEAcABwAGwAaQBjAGEAdABpAG8AbgAgAEQAYQB0AGEAXABNAGkAYwByAG8AcwBvAGYAdABcAE8AdQB0AGwAbwBvAGsAXABPAHUAdABsAG8AbwBrAC4AcABzAHQAAAAYAAAAAAAAALLH/vR9UMVCgMck3LV+0wHCgAAAGAAAAAAAAACyx/70fVDFQoDHJNy1ftMBhLcgAAAAAAAQAAAAqTDfdQ6dIEawbQUxhNxqVz4AAABSRTogQnVnemlsbGE6IEhhcyBhbnlvbmUgc3VjY2Vzc2Z1bGx5IGNyZWF0ZWQgU3ViLUNvbXBvbmVudHM/AA==@amd.com" but couldn't find it in a search (RuntimeError)
Sigh. Why would anyone generate a message id like that?
There were two problems causing your error. I've fixed them both in git
next. You can probably apply the attached patches to your 0.4 release if
you don't want to use git just yet.
The first problem was that marking the message_id field as non-tokenized
in Ferret just solves all sorts of tokenization problems. So that's in.
The second problem is a Ferret bug, where apparently TermQuery values of
more than 255 characters never match anything. The current workaround
just lops off anything after the 255th character. And that may very well
screw things up if falsely uniquefies things.
The right long-term answer is probably to take the hex SHA1 of every
message id and just use that instead of the original value. Then all of
these issues will be solved. That will require an index rebuild for
everyone, so I'm going to hold off on that for now.
--
William <wmorgan-sup at masanjin.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-don-t-tokenize-message_id-field-in-index.patch
Type: application/octet-stream
Size: 945 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080224/94955be9/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-only-use-the-first-255-characters-of-a-message-id-f.patch
Type: application/octet-stream
Size: 1046 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080224/94955be9/attachment-0001.obj
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2008-02-25 5:10 ` William Morgan
@ 2008-02-25 17:08 ` Christopher Warrington
2008-02-25 17:13 ` William Morgan
0 siblings, 1 reply; 11+ messages in thread
From: Christopher Warrington @ 2008-02-25 17:08 UTC (permalink / raw)
William Morgan @ 2008-2-24 11:10:57 PM
"[sup-talk] Amazon.com messages can't be added to index" <mid:1203915874-sup-5504 at south>
>> /usr/lib/ruby/gems/1.8/gems/sup-0.4/lib/sup/index.rb:200:in `sync_message': just added message "!~!UENERkVCMDkAAQACAPYAAAAAAAAAOKG7EAXlEBqhuwgAKypWwgAAbXNwc3QuZGxsAAAAAABOSVRB+b+4AQCqADfZbgAAAABDADoAXABEAG8AYwB1AG0AZQBuAHQAcwAgAGEAbgBkACAAUwBlAHQAdABpAG4AZwBzAFwAawBiAGUAbgB0AG8AbgBcAEwAbwBjAGEAbAAgAFMAZQB0AHQAaQBuAGcAcwBcAEEAcABwAGwAaQBjAGEAdABpAG8AbgAgAEQAYQB0AGEAXABNAGkAYwByAG8AcwBvAGYAdABcAE8AdQB0AGwAbwBvAGsAXABPAHUAdABsAG8AbwBrAC4AcABzAHQAAAAYAAAAAAAAALLH/vR9UMVCgMck3LV+0wHCgAAAGAAAAAAAAACyx/70fVDFQoDHJNy1ftMBhLcgAAAAAAAQAAAAqTDfdQ6dIEawbQUxhNxqVz4AAABSRTogQnVnemlsbGE6IEhhcyBhbnlvbmUgc3VjY2Vzc2Z1bGx5IGNyZWF0ZWQgU3ViLUNvbXBvbmVudHM/AA==@amd.com" but couldn't find it in a search (RuntimeError)
> Sigh. Why would anyone generate a message id like that?
I've seen some Netscape.com (webmail, I assume) messages with ids
like that. In fact, I just got one today:
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAXMhTt2UCrU6Chc6GzOTGusKAAAAQAAAASrY7ZritsEOjEOY8QXLoDQEAAAAA at netscape.com>
--
Christopher Warrington <chrisw at rice.edu>
"Be careful of reading health books: you might die of a misprint."
-Mark Twain
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 183 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080225/fa4b01ba/attachment.bin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] Amazon.com messages can't be added to index
2008-02-25 17:08 ` Christopher Warrington
@ 2008-02-25 17:13 ` William Morgan
0 siblings, 0 replies; 11+ messages in thread
From: William Morgan @ 2008-02-25 17:13 UTC (permalink / raw)
Reformatted excerpts from Christopher Warrington's message of 2008-02-25:
> I've seen some Netscape.com (webmail, I assume) messages with ids
> like that. In fact, I just got one today:
>
> Message-ID:
> <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAXMhTt2UCrU6Chc6GzOTGusKAAAAQAAAASrY7ZritsEOjEOY8QXLoDQEAAAAA at netscape.com>
Hey, at least that one's under 255 characters. :)
Does seem to be the same software, though. !~! my ass.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-02-25 17:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-14 3:32 [sup-talk] Amazon.com messages can't be added to index jenny w
2007-10-14 5:20 ` Kevin Mark
2007-10-14 9:04 ` jenny w
2007-10-14 21:23 ` Christopher Warrington
2007-10-14 22:32 ` jenny w
2007-10-28 3:06 ` William Morgan
2008-02-22 15:46 Luis Villa
2008-02-22 16:25 ` Luis Villa
2008-02-25 5:10 ` William Morgan
2008-02-25 17:08 ` Christopher Warrington
2008-02-25 17:13 ` William Morgan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox