* Re: [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order
[not found] <1261485246-sup-4236@tilus.net>
@ 2009-12-27 21:37 ` Rich Lane
2009-12-30 2:41 ` Tero Tilus
0 siblings, 1 reply; 6+ messages in thread
From: Rich Lane @ 2009-12-27 21:37 UTC (permalink / raw)
To: Tero Tilus; +Cc: sup-devel
Excerpts from Tero Tilus's message of Tue Dec 22 07:42:52 -0500 2009:
> This way I got rid of a couple of counterintuitive threading results.
> Namely real root of a thread would occasionally not be displayed as a
> root if a message containing the real root in the middle of its
> refs-list (dunno why) would get yielded (to threading algorithm)
> before the real root. Threading algorithm looks like it silently
> expects threaded messages to appear in cronological order.
Hmm. Threading should only depend on refs and reply-tos, not the date.
Could you give a short example (just the relevant headers) of a
situation where this patch helps?
What you describe sounds like a malformed message. What client is
generating them / how common are they?
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order
2009-12-27 21:37 ` [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order Rich Lane
@ 2009-12-30 2:41 ` Tero Tilus
2009-12-30 14:10 ` William Morgan
0 siblings, 1 reply; 6+ messages in thread
From: Tero Tilus @ 2009-12-30 2:41 UTC (permalink / raw)
To: sup-devel
Rich Lane, 2009-12-27 23:37:
> Hmm. Threading should only depend on refs and reply-tos, not the date.
I think threading _should_ depend on date too. Not of course the
parent-connections, but the ordering of siblings. So even this bug(?)
aside the messages should afaik be processed in chronological order
when threading to get siblings ordered by date.
> Could you give a short example (just the relevant headers) of a
> situation where this patch helps?
>
> What you describe sounds like a malformed message. What client is
> generating them / how common are they?
For what I know you might trigger this by replying to many messages at
once and thus having a list of ids in-reply-to header (in whatever
order of course, rfc doesn't require any particular order) instead of
one. Then when you reply to this message using MUA that is bold
enough to try to form References: with the standard in-reply-to +
my-id method even if RFC 2822 says "trying to form a References: field
for a reply that has multiple parents is discouraged and how to do so
is not defined in this document". You end up having References: which
has bunch of (thread-wise) random ids in random order instead of the
rfc-specified original, reply, replytoreply, etc. chain of ids.
Workaround is easy. Just process messages sorted by date so the
in-reply-to fields of original messages override the fscked up
references of some latter mangled replies, which of course appear
_after_ any of the messages which threading they could possibly fsck
... they wouldn't be replies if they didn't. ;)
This thread was the itch that made me scratch. I haven't really
looked for other twisted threads, but I've got several thousands of
mails from these same authors so I assume this is not singular case.
User agent headers also included.
Fscked up threading looks like this (produced by current git next)
+ Person Three, joulu 18 (2 weeks ago)
+ Person Four, joulu 18 (2 weeks ago)
+ Person Four, joulu 17 (2 weeks ago)
+ Person One, joulu 17 (2 weeks ago)
+ Person Five, joulu 17 (2 weeks ago)
+ Person Four, joulu 17 (2 weeks ago)
+ Person Three, joulu 15 (2 weeks ago)
+ Person Two, joulu 15 (2 weeks ago)
+ Person One, joulu 15 (2 weeks ago)
+ Person Four, joulu 18 (2 weeks ago)
+ Person Three, joulu 18 (2 weeks ago)
+ Person Two, joulu 18 (2 weeks ago)
+ Person One, joulu 18 (2 weeks ago)
+ Person One, joulu 19 (2 weeks ago)
Correct like this (produced by current git next + threading and date
format patches, and that's why date formats differ too)
+ Person One, 15. 12:38 (2 weeks ago)
+ Person Two, 15. 14:17 (2 weeks ago)
+ Person Three, 15. 14:35 (2 weeks ago)
+ Person Four, 17. 01:47 (2 weeks ago)
+ Person Five, 17. 02:28 (2 weeks ago)
+ Person One, 17. 09:08 (2 weeks ago)
+ Person Four, 17. 11:26 (2 weeks ago)
+ Person Four, 18. 01:15 (2 weeks ago)
+ Person Three, 18. 10:15 (2 weeks ago)
+ Person Two, 18. 12:16 (2 weeks ago)
+ Person One, 18. 13:30 (2 weeks ago)
+ Person One, 19. 13:43 (2 weeks ago)
+ Person Four, 18. 14:16 (2 weeks ago)
+ Person Three, 18. 14:53 (2 weeks ago)
The headers in the order the messages appear in correct threading.
Date: Tue, 15 Dec 2009 12:38:28 +0200
From: Person One
Message-ID: <20091215103828.GA8328@domain-one>
User-Agent: Mutt/1.5.20 (2009-06-14)
Date: Tue, 15 Dec 2009 14:17:38 +0200
From: Person Two
Message-ID: <1260879458.2530.42.camel@havelock>
In-Reply-To: <20091215103828.GA8328@domain-one>
References: <20091215103828.GA8328@domain-one>
X-Mailer: Evolution 2.28.1
Date: Tue, 15 Dec 2009 14:35:01 +0200 (EET)
From: Person Three
Message-ID: <alpine.LRH.1.10.0912151434380.12088@domain-two>
In-Reply-To: <20091215103828.GA8328@domain-one>
References: <20091215103828.GA8328@domain-one>
User-Agent: Alpine 1.10 (LRH 962 2008-03-14)
Date: Thu, 17 Dec 2009 01:47:59 +0200
From: Person Four
Message-ID: <4B2971AF.7060808@domain-three>
In-Reply-To: <20091215103828.GA8328@domain-one>
References: <20091215103828.GA8328@domain-one>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0
Date: Thu, 17 Dec 2009 02:28:55 +0200 (EET)
From: Person Five
Message-ID: <alpine.DEB.2.00.0912170214460.25488@domain-five>
In-Reply-To: <4B2971AF.7060808@domain-three>
References: <20091215103828.GA8328@domain-one>
<4B2971AF.7060808@domain-three>
User-Agent: Alpine 2.00 (DEB 1167 2008-08-23)
Date: Thu, 17 Dec 2009 09:08:31 +0200
From: Person One
Message-ID: <20091217070831.GD27029@domain-one>
In-Reply-To: <4B2971AF.7060808@domain-three>
References: <20091215103828.GA8328@domain-one>
<4B2971AF.7060808@domain-three>
User-Agent: Mutt/1.5.20 (2009-06-14)
Date: Thu, 17 Dec 2009 11:26:15 +0200
From: Person Four
Message-ID: <4B29F937.7080909@domain-four>
In-Reply-To: <20091217070831.GD27029@domain-one>
References: <20091215103828.GA8328@domain-one> <4B2971AF.7060808@domain-three>
<20091217070831.GD27029@domain-one>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
Date: Fri, 18 Dec 2009 01:15:33 +0200
From: Person Four
Message-ID: <4B2ABB95.6010301@domain-three>
In-Reply-To: <20091215103828.GA8328@domain-one>
References: <20091215103828.GA8328@domain-one>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0
Date: Fri, 18 Dec 2009 10:15:45 +0200 (EET)
From: Person Three
Message-ID: <alpine.LRH.1.10.0912181012570.30704@domain-two>
In-Reply-To: <4B2ABB95.6010301@domain-three>
References: <20091215103828.GA8328@domain-one>
<4B2ABB95.6010301@domain-three>
User-Agent: Alpine 1.10 (LRH 962 2008-03-14)
Date: Fri, 18 Dec 2009 12:16:57 +0200
From: Person Two
Message-ID: <1261131417.2530.179.camel@havelock>
In-Reply-To: <alpine.LRH.1.10.0912181012570.30704@domain-two>
References: <20091215103828.GA8328@domain-one>
<4B2ABB95.6010301@domain-three>
<alpine.LRH.1.10.0912181012570.30704@domain-two>
X-Mailer: Evolution 2.28.1
Date: Fri, 18 Dec 2009 13:30:10 +0200
From: Person One
Message-ID: <20091218113010.GI3160@domain-one>
In-Reply-To: <1261131417.2530.179.camel@havelock>
<alpine.LRH.1.10.0912181012570.30704@domain-two>
<4B2ABB95.6010301@domain-three> <4B29F937.7080909@domain-four>
<20091217070831.GD27029@domain-one>
<alpine.DEB.2.00.0912170214460.25488@domain-five>
<4B2971AF.7060808@domain-three>
<alpine.LRH.1.10.0912151434380.12088@domain-two>
<1260879458.2530.42.camel@havelock>
<20091215103828.GA8328@domain-one>
User-Agent: Mutt/1.5.20 (2009-06-14)
Date: Sat, 19 Dec 2009 13:43:14 +0200
From: Person One
Message-ID: <20091219114314.GA15682@domain-six>
In-Reply-To: <20091218113010.GI3160@domain-one>
References: <alpine.LRH.1.10.0912181012570.30704@domain-two>
<4B2ABB95.6010301@domain-three> <4B29F937.7080909@domain-four>
<20091217070831.GD27029@domain-one>
<alpine.DEB.2.00.0912170214460.25488@domain-five>
<4B2971AF.7060808@domain-three>
<alpine.LRH.1.10.0912151434380.12088@domain-two>
<1260879458.2530.42.camel@havelock>
<20091215103828.GA8328@domain-one>
<20091218113010.GI3160@domain-one>
User-Agent: Mutt/1.5.18 (2008-05-17)
--
Tero Tilus ## 050 3635 235 ## http://tero.tilus.net/
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order
2009-12-30 2:41 ` Tero Tilus
@ 2009-12-30 14:10 ` William Morgan
2009-12-30 17:01 ` Rich Lane
0 siblings, 1 reply; 6+ messages in thread
From: William Morgan @ 2009-12-30 14:10 UTC (permalink / raw)
To: sup-devel
Reformatted excerpts from Tero Tilus's message of 2009-12-29:
> For what I know you might trigger this by replying to many messages at
> once and thus having a list of ids in-reply-to header (in whatever
> order of course, rfc doesn't require any particular order) instead of
> one. Then when you reply to this message using MUA that is bold
> enough to try to form References: with the standard in-reply-to +
> my-id method even if RFC 2822 says "trying to form a References: field
> for a reply that has multiple parents is discouraged and how to do so
> is not defined in this document". You end up having References: which
> has bunch of (thread-wise) random ids in random order instead of the
> rfc-specified original, reply, replytoreply, etc. chain of ids.
It's worth reading the top bit of http://www.jwz.org/doc/threading.html
for what In-reply-to: and References: look like in practice. (Basically:
a mess, and the references: header in particular can be truncated in any
way that any MUA feels is reasonable.)
The threading used by the Ferret indexer is a pretty faithful
reproduction of the algorithm described on that page. I'm not that
familiar with the one used by the Xapian index, but a cursory
examination suggests it's a little more fragile.
--
William <wmorgan-sup@masanjin.net>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order
2009-12-30 14:10 ` William Morgan
@ 2009-12-30 17:01 ` Rich Lane
2009-12-31 19:41 ` William Morgan
0 siblings, 1 reply; 6+ messages in thread
From: Rich Lane @ 2009-12-30 17:01 UTC (permalink / raw)
To: William Morgan; +Cc: sup-devel
Excerpts from William Morgan's message of Wed Dec 30 09:10:54 -0500 2009:
> Reformatted excerpts from Tero Tilus's message of 2009-12-29:
> > For what I know you might trigger this by replying to many messages at
> > once and thus having a list of ids in-reply-to header (in whatever
> > order of course, rfc doesn't require any particular order) instead of
> > one. Then when you reply to this message using MUA that is bold
> > enough to try to form References: with the standard in-reply-to +
> > my-id method even if RFC 2822 says "trying to form a References: field
> > for a reply that has multiple parents is discouraged and how to do so
> > is not defined in this document". You end up having References: which
> > has bunch of (thread-wise) random ids in random order instead of the
> > rfc-specified original, reply, replytoreply, etc. chain of ids.
>
> It's worth reading the top bit of http://www.jwz.org/doc/threading.html
> for what In-reply-to: and References: look like in practice. (Basically:
> a mess, and the references: header in particular can be truncated in any
> way that any MUA feels is reasonable.)
>
> The threading used by the Ferret indexer is a pretty faithful
> reproduction of the algorithm described on that page. I'm not that
> familiar with the one used by the Xapian index, but a cursory
> examination suggests it's a little more fragile.
I'm assuming you're talking about each_message_in_thread_for, since
that's the only Index method that deals with threading.
In what order does ThreadSet#add_message expect to get messages in?
This determines the order from Index#each_message_in_thread_for. I'd
assumed an arbitrary ordering would work because add_message needs to
handle this case anyway to work with messages arriving out of order from
the source (which happens all the time) and then added to the Inbox
threadset. AFAICT JWZ's algorithm should work regardless of the order
messages handed to it.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order
2009-12-30 17:01 ` Rich Lane
@ 2009-12-31 19:41 ` William Morgan
2010-01-01 13:19 ` Tero Tilus
0 siblings, 1 reply; 6+ messages in thread
From: William Morgan @ 2009-12-31 19:41 UTC (permalink / raw)
To: sup-devel
Reformatted excerpts from Rich Lane's message of 2009-12-30:
> I'm assuming you're talking about each_message_in_thread_for, since
> that's the only Index method that deals with threading.
Yes, I suppose. In my mind the Xapian index had replaced the ThreadSet
threading entirely, but perhaps that's not the case.
> In what order does ThreadSet#add_message expect to get messages in?
Arbitrary.
> AFAICT JWZ's algorithm should work regardless of the order messages
> handed to it.
That's my understanding too.
I don't like adding date as a component for threading (because it's just
asking for a screwey date to wreak havok, just as a screwey References:
header wreaks havok now). I don't like playing around with the threading
algorithm, not in the least because we don't have a good test harness
that lets us know if we screw something up. So I'm inclined to sit on
this patch.
Out of curiousity, Tero, could the problem also be solved by giving the
in-reply-to header precedence over the references header?
--
William <wmorgan-sup@masanjin.net>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order
2009-12-31 19:41 ` William Morgan
@ 2010-01-01 13:19 ` Tero Tilus
0 siblings, 0 replies; 6+ messages in thread
From: Tero Tilus @ 2010-01-01 13:19 UTC (permalink / raw)
To: sup-devel
William Morgan, 2009-12-31 21:41:
> Out of curiousity, Tero, could the problem also be solved by giving
> the in-reply-to header precedence over the references header?
Well, yes and no. ;)
I think what it needs is to do is
a) consider only the first message in In-reply-to: (like it already
does),
b) prioritize In-reply-to: ahead of References: (like it already
does!) and
c) if In-reply-to: would create a loop or diamond, resolve by
dropping another link ("topmost" conflicting?) and keep the one
from In-reply-to: (currently it drops the link suggested by
In-reply-to: over another potentially coming from (messed up)
References:).
Lemme speculate on this a bit.
Current threading implementation tries to give In-reply-to: precedence
over References: but it still could leave the (in my previous mail
described way) malformed References: affecting the real root of the
thread. By the time we encounter the In-reply-to: headers which would
need to take precedence over the References:, there could already be
bogus parent to the root.
Say we have
First (no In-reply-to: or References:)
+- Second (In-reply-to: First; References: First)
+- Third (In-reply-to: Second, First; no References:)
+- Fourth (In-reply-to: Third; References: Second, First, Third)
If Third is a reply to both Second and First (in that order). Then
Fourth might have References: Second, First, Third. If, when
threading, Fourth is the first processed message then First is seen as
a reply to Second. Now when itself Second is processed, the
In-reply-to: in it would create a loop and is discarded, (see
ThreadSet#link). Resulting in
Second
+- First
+- Third
+- Fourth
which is exactly what the example headers I posted seem to produce
(real root jumps in the middle of one of the branches).
--
Tero Tilus ## 050 3635 235 ## http://tero.tilus.net/
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-01-01 13:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1261485246-sup-4236@tilus.net>
2009-12-27 21:37 ` [sup-devel] [PATCH] XapianIndex.each_message_in_thread_for yields messages in cronological order Rich Lane
2009-12-30 2:41 ` Tero Tilus
2009-12-30 14:10 ` William Morgan
2009-12-30 17:01 ` Rich Lane
2009-12-31 19:41 ` William Morgan
2010-01-01 13:19 ` Tero Tilus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox