From: ezyang@MIT.EDU (Edward Z. Yang)
Subject: [sup-talk] Sup is hanging
Date: Sat, 06 Jun 2009 02:20:25 -0400 [thread overview]
Message-ID: <1244267416-sup-3720@javelin> (raw)
In-Reply-To: <1244238388-sup-760@javelin>
Excerpts from Edward Z. Yang's message of Fri Jun 05 17:47:00 -0400 2009:
> Now that you mention it, the messages that tickle this bug on my side also
> have one extremely long line. That's very interesting.
Here is the culprit, laid out to bear its full shame:
/\w.*:$/
I thought this was a suspicious looking regexen; a simple test confirmed my
belief:
line = ":a" * 10000
line =~ /\w.*:$/
Ba boom ba boom ba boom. This is a textbook case of catastrophic backtracking.
I have two possible fixes, they end up being about the same time for regular
cases, but the second one is more optimal for really long strings:
First, the simple one:
diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..0ddd3af 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,7 @@ class Message
QUOTE_PATTERN = /^\s{0,4}[>|\}]/
BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
- QUOTE_START_PATTERN = /\w.*:$/
+ QUOTE_START_PATTERN = /\w\W*:$/
SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|(^\s*--\+\+\*\*==)/
MAX_SIG_DISTANCE = 15 # lines from the end
And the slightly more complicated one (but optimal for large n):
diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..c5481a6 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,6 @@ class Message
QUOTE_PATTERN = /^\s{0,4}[>|\}]/
BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
- QUOTE_START_PATTERN = /\w.*:$/
SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|
MAX_SIG_DISTANCE = 15 # lines from the end
@@ -449,7 +448,7 @@ private
when :text
newstate = nil
- if line =~ QUOTE_PATTERN || (line =~ QUOTE_START_PATTERN && nextline =~ QUO
+ if line =~ QUOTE_PATTERN || (line =~ /:$/ && line =~ /\w/ && nextline =~ QU
newstate = :quote
elsif line =~ SIG_PATTERN && (lines.length - i) < MAX_SIG_DISTANCE
newstate = :sig
There are number of micro-optimizations that could be made to message
parsing, but this will basically fix the egregious problem.
Cheers,
Edward
next prev parent reply other threads:[~2009-06-06 6:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-03 17:39 Edward Z. Yang
2009-06-03 18:11 ` William Morgan
2009-06-03 18:26 ` Edward Z. Yang
2009-06-03 18:21 ` Edward Z. Yang
2009-06-03 18:45 ` Edward Z. Yang
2009-06-03 21:36 ` William Morgan
2009-06-03 21:48 ` Edward Z. Yang
2009-06-04 2:11 ` William Morgan
2009-06-03 22:00 ` [sup-talk] Sup is hangingy Edward Z. Yang
2009-06-04 1:26 ` Edward Z. Yang
2009-06-04 1:53 ` [sup-talk] Sup is hangingyy Edward Z. Yang
2009-06-04 16:09 ` [sup-talk] Sup is hanging William Morgan
2009-06-05 5:08 ` Edward Z. Yang
2009-06-05 13:23 ` William Morgan
[not found] ` <1244227108-sup-3123@cabinet>
2009-06-05 21:47 ` Edward Z. Yang
2009-06-06 6:20 ` Edward Z. Yang [this message]
2009-06-08 18:09 ` William Morgan
2009-06-04 2:12 ` [sup-talk] Sup is hangingy William Morgan
2009-06-04 2:13 ` William Morgan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1244267416-sup-3720@javelin \
--to=ezyang@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox