* [sup-talk] Non-english outlook block quote regexp
@ 2011-01-26 13:12 Ico Doornekamp
2011-01-27 13:29 ` Tero Tilus
0 siblings, 1 reply; 4+ messages in thread
From: Ico Doornekamp @ 2011-01-26 13:12 UTC (permalink / raw)
To: sup-talk
Hi,
I'm unfortunate enough to have regular correspondence with Dutch
outlook-users, which is mostly annoying because of the way outlook
handles quoting of original messages.
I found that Sup is able to handle block quotes from English outlook
users where the regexp
^-----\s*Original Message\s*----+$
is used to find out where the quote starts. This fails unfortunately for
other languages, because the 'Original Mesage' text seems to be
localized. In dutch for example, the text 'Oorspronkelik Bericht' is
used instead.
Would it be an impovement to change this to a more generic regexp to
match more languages. I was not able to find a complete list of possible
strings used here, so some heuristics would be necassery.
Any opinions on matching the exact number of hashes, two uppercase words
and again the exact number of hashes, something like:
^-----\s*([A-Z][a-z]+\s*){2}----+$
Would that be safe to do ?
--
:wq
^X^Cy^K^X^C^C^C^C
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [sup-talk] Non-english outlook block quote regexp
2011-01-26 13:12 [sup-talk] Non-english outlook block quote regexp Ico Doornekamp
@ 2011-01-27 13:29 ` Tero Tilus
2011-01-27 15:00 ` Michael Stapelberg
0 siblings, 1 reply; 4+ messages in thread
From: Tero Tilus @ 2011-01-27 13:29 UTC (permalink / raw)
To: Sup users
Ico Doornekamp, 2011-01-26 15:12:
> Any opinions on matching the exact number of hashes, two uppercase words
> and again the exact number of hashes, something like:
>
> ^-----\s*([A-Z][a-z]+\s*){2}----+$
>
> Would that be safe to do ?
Prolly safe, but it misses german Outlook quote
"-------- Original-Nachricht --------"
and yes, it has different amount of dashes :-O Also it misses finnish
quote (for two obvious reason).
"-----Alkuperäinen viesti-----"
Would ^-----+\s*\S+[ -]\S+\s*-----+$ do the trick and not give false
positives?
ps. I have occasionaly thought of configurable quote etc. regexen.
Would anybody else use such a feature? Or should we go all the way to
state-transition-hook for state machine parsing message body? :)
--
Tero Tilus ## 050 3635 235 ## http://tero.tilus.net/
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [sup-talk] Non-english outlook block quote regexp
2011-01-27 13:29 ` Tero Tilus
@ 2011-01-27 15:00 ` Michael Stapelberg
2011-01-28 9:24 ` Ico Doornekamp
0 siblings, 1 reply; 4+ messages in thread
From: Michael Stapelberg @ 2011-01-27 15:00 UTC (permalink / raw)
To: sup-talk
Hi Tero,
Excerpts from Tero Tilus's message of 2011-01-27 14:29:22 +0100:
> Prolly safe, but it misses german Outlook quote
We also discussed this on IRC. Ico came up with a list of a few more
(localized) messages. I suggested the most pragmatic solution: keeping this
list around with a comment to send patches if anybody stumbles upon a new
localized version of it.
I don’t really think a hook is a good way for this one because the user should
not be the one who has to maintain an accurate and up-to-date list of these
strings…
Best regards,
Michael
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [sup-talk] Non-english outlook block quote regexp
2011-01-27 15:00 ` Michael Stapelberg
@ 2011-01-28 9:24 ` Ico Doornekamp
0 siblings, 0 replies; 4+ messages in thread
From: Ico Doornekamp @ 2011-01-28 9:24 UTC (permalink / raw)
To: sup-talk
* On Thu Jan 27 16:00:10 +0100 2011, Michael Stapelberg wrote:
> Excerpts from Tero Tilus's message of 2011-01-27 14:29:22 +0100:
> > Prolly safe, but it misses german Outlook quote
> We also discussed this on IRC. Ico came up with a list of a few more
> (localized) messages. I suggested the most pragmatic solution: keeping
> this list around with a comment to send patches if anybody stumbles
> upon a new localized version of it.
Yes, that's probably the most pragmatic way to go. It's a shame that MUA
type A is forced to keep a list of possible ramblings of MUA type B to
do it's work, but that's the way it is.
--
:wq
^X^Cy^K^X^C^C^C^C
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-28 9:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-26 13:12 [sup-talk] Non-english outlook block quote regexp Ico Doornekamp
2011-01-27 13:29 ` Tero Tilus
2011-01-27 15:00 ` Michael Stapelberg
2011-01-28 9:24 ` Ico Doornekamp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox