Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: ezyang@MIT.EDU (Edward Z. Yang)
Subject: [sup-talk] Sup is hanging
Date: Sat, 06 Jun 2009 02:20:25 -0400	[thread overview]
Message-ID: <1244267416-sup-3720@javelin> (raw)
In-Reply-To: <1244238388-sup-760@javelin>

Excerpts from Edward Z. Yang's message of Fri Jun 05 17:47:00 -0400 2009:
> Now that you mention it, the messages that tickle this bug on my side also
> have one extremely long line.  That's very interesting.

Here is the culprit, laid out to bear its full shame:

    /\w.*:$/

I thought this was a suspicious looking regexen; a simple test confirmed my
belief:

    line = ":a" * 10000
    line =~ /\w.*:$/

Ba boom ba boom ba boom.  This is a textbook case of catastrophic backtracking.

I have two possible fixes, they end up being about the same time for regular
cases, but the second one is more optimal for really long strings:

First, the simple one:

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..0ddd3af 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,7 @@ class Message
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
   BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
-  QUOTE_START_PATTERN = /\w.*:$/
+  QUOTE_START_PATTERN = /\w\W*:$/
   SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|(^\s*--\+\+\*\*==)/
 
   MAX_SIG_DISTANCE = 15 # lines from the end

And the slightly more complicated one (but optimal for large n):

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..c5481a6 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,6 @@ class Message
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
   BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
-  QUOTE_START_PATTERN = /\w.*:$/
   SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|
 
   MAX_SIG_DISTANCE = 15 # lines from the end
@@ -449,7 +448,7 @@ private
       when :text
         newstate = nil
 
-        if line =~ QUOTE_PATTERN || (line =~ QUOTE_START_PATTERN && nextline =~ QUO
+        if line =~ QUOTE_PATTERN || (line =~ /:$/ && line =~ /\w/ && nextline =~ QU
           newstate = :quote
         elsif line =~ SIG_PATTERN && (lines.length - i) < MAX_SIG_DISTANCE
           newstate = :sig

There are number of micro-optimizations that could be made to message
parsing, but this will basically fix the egregious problem.

Cheers,
Edward


  reply	other threads:[~2009-06-06  6:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-03 17:39 Edward Z. Yang
2009-06-03 18:11 ` William Morgan
2009-06-03 18:26   ` Edward Z. Yang
2009-06-03 18:21 ` Edward Z. Yang
2009-06-03 18:45   ` Edward Z. Yang
2009-06-03 21:36   ` William Morgan
2009-06-03 21:48     ` Edward Z. Yang
2009-06-04  2:11       ` William Morgan
2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
2009-06-04  1:26       ` Edward Z. Yang
2009-06-04  1:53         ` [sup-talk] Sup is hangingyy Edward Z. Yang
2009-06-04 16:09           ` [sup-talk] Sup is hanging William Morgan
2009-06-05  5:08             ` Edward Z. Yang
2009-06-05 13:23               ` William Morgan
     [not found]               ` <1244227108-sup-3123@cabinet>
2009-06-05 21:47                 ` Edward Z. Yang
2009-06-06  6:20                   ` Edward Z. Yang [this message]
2009-06-08 18:09                     ` William Morgan
2009-06-04  2:12       ` [sup-talk] Sup is hangingy William Morgan
2009-06-04  2:13       ` William Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1244267416-sup-3720@javelin \
    --to=ezyang@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox