* [sup-devel] [issue114] Better quoted-text / top-post stripping
@ 2010-08-04 12:31 anonymous
2010-08-04 14:22 ` Alvaro Herrera
0 siblings, 1 reply; 2+ messages in thread
From: anonymous @ 2010-08-04 12:31 UTC (permalink / raw)
To: sup-devel
[-- Attachment #1: Type: text/plain, Size: 881 bytes --]
New submission from anonymous:
Redwood::Message splits a message into chunks and hides quoted text from a
previous message.
Top-posted responses are supported with BLOCK_QUOTE_PATTERN. This patch i) adds
a new pattern to hide top-posted text from Microsoft Entourage, ii) adds a fix
for Mozilla-based mail users who top post and iii) adds a new array-based config
option "block_quote_patterns" for adding additional patterns to treat as marking
the top of a top-posted response.
----------
files: patch.better-top-post-stripping
messages: 261
nosy: anonymous
priority: feature request
ruby_version: 1.8.7
status: unread
sup_version: 0.11
title: Better quoted-text / top-post stripping
_________________________________________
Sup issue tracker <sup-bugs@masanjin.net>
<http://masanjin.net/sup-bugs/issue114>
_________________________________________
[-- Attachment #2: patch.better-top-post-stripping --]
[-- Type: application/octet-stream, Size: 1483 bytes --]
--- lib/sup/message.rb.orig 2010-08-04 12:05:44.000000000 +0100
+++ lib/sup/message.rb 2010-08-04 13:27:07.000000000 +0100
@@ -25,7 +25,26 @@
end
QUOTE_PATTERN = /^\s{0,4}[>|\}]/
- BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
+
+ BLOCK_QUOTE_PATTERNS =
+ [
+ # NB: these should preferably not be anchored to line endings ('$') due
+ # to line ending encodings ('=20').
+
+ # At least three dashes. Mozilla mail clients downcase the 'm' in
+ # message.
+ /^----+\s*Original (M|m)essage\s*----+/,
+
+ # Microsoft Entourage doesn't indent quoted text, but it can be spotted
+ # with this line:
+ # On 8/2/10 1:23 PM, "John Doe" <edward@facebook.com> wrote:
+ /^On \d+\/\d+\/\d+ .+ wrote:/,
+ ]
+
+ if ar = $config[:block_quote_patterns]
+ BLOCK_QUOTE_PATTERNS += ar.map{ |s| Regexp.new(s) }
+ end
+
SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|(^\s*--\+\+\*\*==)/
MAX_SIG_DISTANCE = 15 # lines from the end
@@ -540,8 +559,13 @@
newstate = :quote
elsif line =~ SIG_PATTERN && (lines.length - i) < MAX_SIG_DISTANCE
newstate = :sig
- elsif line =~ BLOCK_QUOTE_PATTERN
- newstate = :block_quote
+ else
+ for pattern in BLOCK_QUOTE_PATTERNS
+ if line =~ pattern
+ newstate = :block_quote
+ break
+ end
+ end
end
if newstate
[-- Attachment #3: Type: text/plain, Size: 143 bytes --]
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [sup-devel] [issue114] Better quoted-text / top-post stripping
2010-08-04 12:31 [sup-devel] [issue114] Better quoted-text / top-post stripping anonymous
@ 2010-08-04 14:22 ` Alvaro Herrera
0 siblings, 0 replies; 2+ messages in thread
From: Alvaro Herrera @ 2010-08-04 14:22 UTC (permalink / raw)
To: anonymous; +Cc: sup-devel
Excerpts from anonymous's message of mié ago 04 08:31:13 -0400 2010:
> Redwood::Message splits a message into chunks and hides quoted text from a
> previous message.
>
> Top-posted responses are supported with BLOCK_QUOTE_PATTERN. This patch i) adds
> a new pattern to hide top-posted text from Microsoft Entourage, ii) adds a fix
> for Mozilla-based mail users who top post and iii) adds a new array-based config
> option "block_quote_patterns" for adding additional patterns to treat as marking
> the top of a top-posted response.
+1 for improving the top-posted detection in general.
I think this regex
+ # Microsoft Entourage doesn't indent quoted text, but it can be spotted
+ # with this line:
+ # On 8/2/10 1:23 PM, "John Doe" <edward@facebook.com> wrote:
+ /^On \d+\/\d+\/\d+ .+ wrote:/,
is way too general; it could easily match the attribution line on a
non-top-posted quoted email. I didn't try it but I think it would end
up trimming the whole contents of several emails I have on my inboxes
(which is pretty annoying --- I have set SIG_PATTERN to be just "^-- "
to avoid this very problem).
There's a small bug here:
+ # At least three dashes. Mozilla mail clients downcase the 'm' in
+ # message.
+ /^----+\s*Original (M|m)essage\s*----+/,
Note that it matches only four or more dashes, not three as the comment says.
--
Álvaro Herrera <alvherre@alvh.no-ip.org>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-08-04 14:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-04 12:31 [sup-devel] [issue114] Better quoted-text / top-post stripping anonymous
2010-08-04 14:22 ` Alvaro Herrera
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox