Archive of RubyForge sup-devel mailing list
 help / color / mirror / Atom feed
* [sup-devel] [issue114] Better quoted-text / top-post stripping
@ 2010-08-04 12:31 anonymous
  2010-08-04 14:22 ` Alvaro Herrera
  0 siblings, 1 reply; 2+ messages in thread
From: anonymous @ 2010-08-04 12:31 UTC (permalink / raw)
  To: sup-devel

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]


New submission from anonymous:

Redwood::Message splits a message into chunks and hides quoted text from a
previous message.

Top-posted responses are supported with BLOCK_QUOTE_PATTERN.  This patch i) adds
a new pattern to hide top-posted text from Microsoft Entourage, ii) adds a fix
for Mozilla-based mail users who top post and iii) adds a new array-based config
option "block_quote_patterns" for adding additional patterns to treat as marking
the top of a top-posted response.

----------
files: patch.better-top-post-stripping
messages: 261
nosy: anonymous
priority: feature request
ruby_version: 1.8.7
status: unread
sup_version: 0.11
title: Better quoted-text / top-post stripping

_________________________________________
Sup issue tracker <sup-bugs@masanjin.net>
<http://masanjin.net/sup-bugs/issue114>
_________________________________________

[-- Attachment #2: patch.better-top-post-stripping --]
[-- Type: application/octet-stream, Size: 1483 bytes --]

--- lib/sup/message.rb.orig	2010-08-04 12:05:44.000000000 +0100
+++ lib/sup/message.rb	2010-08-04 13:27:07.000000000 +0100
@@ -25,7 +25,26 @@
   end
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
-  BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
+
+  BLOCK_QUOTE_PATTERNS =
+    [
+      # NB: these should preferably not be anchored to line endings ('$') due
+      # to line ending encodings ('=20').
+
+      # At least three dashes.  Mozilla mail clients downcase the 'm' in
+      # message.
+      /^----+\s*Original (M|m)essage\s*----+/,
+
+      # Microsoft Entourage doesn't indent quoted text, but it can be spotted
+      # with this line:
+      #   On 8/2/10 1:23 PM, "John Doe" <edward@facebook.com> wrote:
+      /^On \d+\/\d+\/\d+ .+ wrote:/,
+    ]
+
+  if ar = $config[:block_quote_patterns]
+    BLOCK_QUOTE_PATTERNS += ar.map{ |s| Regexp.new(s) }
+  end
+
   SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|(^\s*--\+\+\*\*==)/
 
   MAX_SIG_DISTANCE = 15 # lines from the end
@@ -540,8 +559,13 @@
           newstate = :quote
         elsif line =~ SIG_PATTERN && (lines.length - i) < MAX_SIG_DISTANCE
           newstate = :sig
-        elsif line =~ BLOCK_QUOTE_PATTERN
-          newstate = :block_quote
+        else
+          for pattern in BLOCK_QUOTE_PATTERNS
+            if line =~ pattern
+              newstate = :block_quote
+              break
+            end
+          end
         end
 
         if newstate

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [sup-devel] [issue114] Better quoted-text / top-post stripping
  2010-08-04 12:31 [sup-devel] [issue114] Better quoted-text / top-post stripping anonymous
@ 2010-08-04 14:22 ` Alvaro Herrera
  0 siblings, 0 replies; 2+ messages in thread
From: Alvaro Herrera @ 2010-08-04 14:22 UTC (permalink / raw)
  To: anonymous; +Cc: sup-devel

Excerpts from anonymous's message of mié ago 04 08:31:13 -0400 2010:

> Redwood::Message splits a message into chunks and hides quoted text from a
> previous message.
> 
> Top-posted responses are supported with BLOCK_QUOTE_PATTERN.  This patch i) adds
> a new pattern to hide top-posted text from Microsoft Entourage, ii) adds a fix
> for Mozilla-based mail users who top post and iii) adds a new array-based config
> option "block_quote_patterns" for adding additional patterns to treat as marking
> the top of a top-posted response.

+1 for improving the top-posted detection in general.

I think this regex

+      # Microsoft Entourage doesn't indent quoted text, but it can be spotted
+      # with this line:
+      #   On 8/2/10 1:23 PM, "John Doe" <edward@facebook.com> wrote:
+      /^On \d+\/\d+\/\d+ .+ wrote:/,

is way too general; it could easily match the attribution line on a
non-top-posted quoted email.  I didn't try it but I think it would end
up trimming the whole contents of several emails I have on my inboxes
(which is pretty annoying --- I have set SIG_PATTERN to be just "^-- "
to avoid this very problem).


There's a small bug here:

+      # At least three dashes.  Mozilla mail clients downcase the 'm' in
+      # message.
+      /^----+\s*Original (M|m)essage\s*----+/,

Note that it matches only four or more dashes, not three as the comment says.

-- 
Álvaro Herrera <alvherre@alvh.no-ip.org>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-08-04 14:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-04 12:31 [sup-devel] [issue114] Better quoted-text / top-post stripping anonymous
2010-08-04 14:22 ` Alvaro Herrera

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox