Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
* [sup-talk] Sup is hanging
@ 2009-06-03 17:39 Edward Z. Yang
  2009-06-03 18:11 ` William Morgan
  2009-06-03 18:21 ` Edward Z. Yang
  0 siblings, 2 replies; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-03 17:39 UTC (permalink / raw)


Hey all,

Somehow, I messed something up really bad, and now, whenever Sup
attempts to poll for new messages, it hang completely, spinning the
CPU.  I've let it run for an appreciable amount of time, and haven't
seen any change.  I was wondering how I could fix this, or go about
debugging it.  There also seems to be a single mail source at fault,
but I can't remove it without having Sup complain at me.  Right now,
I'm running sup as -no.

Thanks,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 17:39 [sup-talk] Sup is hanging Edward Z. Yang
@ 2009-06-03 18:11 ` William Morgan
  2009-06-03 18:26   ` Edward Z. Yang
  2009-06-03 18:21 ` Edward Z. Yang
  1 sibling, 1 reply; 19+ messages in thread
From: William Morgan @ 2009-06-03 18:11 UTC (permalink / raw)


Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> Somehow, I messed something up really bad, and now, whenever Sup
> attempts to poll for new messages, it hang completely, spinning the
> CPU.  I've let it run for an appreciable amount of time, and haven't
> seen any change.  I was wondering how I could fix this, or go about
> debugging it.

Is the curses interface hung? If not, can you see if there's activity in
the poll mode or the log mode buffers? (Hit ; if you're running from
git, or B if you're running 0.7.) It might be that you've somehow reset
the recorded offset for a very large source, so Sup is polling a lot of
stuff in the background.

If it's completely hung (which would be weird), try first disabling your
hooks (move them from ~/.sup/hooks to another directory) and see if that
helps. If not, please describe the behavior of running sup-sync.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 17:39 [sup-talk] Sup is hanging Edward Z. Yang
  2009-06-03 18:11 ` William Morgan
@ 2009-06-03 18:21 ` Edward Z. Yang
  2009-06-03 18:45   ` Edward Z. Yang
  2009-06-03 21:36   ` William Morgan
  1 sibling, 2 replies; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-03 18:21 UTC (permalink / raw)


The current working theory (based on strace'ing and lsof) is that Sup
is hanging on a select() call that doesn't have any timeout.  The reason
why this is hanging is because between opening the connection and
closing it, Sup does some ridiculously slow message parsing code (that
really thrashes the CPU) and by the time it's done the server has
closed the connection (but we don't know about it).  The fix would probably
be to always set a timeout, but I don't know what code needs to be fixed
for that.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 18:11 ` William Morgan
@ 2009-06-03 18:26   ` Edward Z. Yang
  0 siblings, 0 replies; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-03 18:26 UTC (permalink / raw)


On Wed, 3 Jun 2009, William Morgan wrote:
> Is the curses interface hung?>

Yep.

> If it's completely hung (which would be weird), try first disabling your
> hooks (move them from ~/.sup/hooks to another directory) and see if that
> helps. If not, please describe the behavior of running sup-sync.

I can try that. I've described the behavior of sup-sync in my other
email.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 18:21 ` Edward Z. Yang
@ 2009-06-03 18:45   ` Edward Z. Yang
  2009-06-03 21:36   ` William Morgan
  1 sibling, 0 replies; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-03 18:45 UTC (permalink / raw)


Also, if it helps, here's the backtrace for when it's hanging on the
select:

     /usr/lib/ruby/1.8/monitor.rb:102:in `stop': Interrupt
     from /usr/lib/ruby/1.8/monitor.rb:102:in `wait'
     from /usr/lib/ruby/1.8/net/imap.rb:972:in `get_tagged_response'
     from /usr/lib/ruby/1.8/net/imap.rb:1032:in `send_command'
     from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'
     from /usr/lib/ruby/1.8/net/imap.rb:1017:in `send_command'
     from /usr/lib/ruby/1.8/net/imap.rb:409:in `examine'
     from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'
     from /usr/lib/ruby/1.8/net/imap.rb:407:in `examine'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:145:in `unsynchronized_scan_mailbox'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:322:in `safely'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:144:in `unsynchronized_scan_mailbox'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:192:in `unsynchronized_start_offset'
     from (eval):3:in `start_offset'
     from (eval):3:in `synchronize'
     from (eval):3:in `start_offset'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:544:in `send'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:544:in `__pass'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:531:in `method_missing'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/bin/sup-sync:147
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/poll.rb:160:in `add_messages_from'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:187:in `each'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:175:in `upto'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/imap.rb:175:in `each'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:544:in `send'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:544:in `__pass'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:531:in `method_missing'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/poll.rb:141:in `add_messages_from'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:505:in `send'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/util.rb:505:in `method_missing'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/bin/sup-sync:140
     from /usr/lib/ruby/gems/1.8/gems/sup-999/bin/sup-sync:135:in `each'
     from /usr/lib/ruby/gems/1.8/gems/sup-999/bin/sup-sync:135
     from /var/lib/gems/1.8/bin/sup-sync:19:in `load'
     from /var/lib/gems/1.8/bin/sup-sync:19

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 18:21 ` Edward Z. Yang
  2009-06-03 18:45   ` Edward Z. Yang
@ 2009-06-03 21:36   ` William Morgan
  2009-06-03 21:48     ` Edward Z. Yang
  2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
  1 sibling, 2 replies; 19+ messages in thread
From: William Morgan @ 2009-06-03 21:36 UTC (permalink / raw)


Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> The current working theory (based on strace'ing and lsof) is that Sup
> is hanging on a select() call that doesn't have any timeout.  The
> reason why this is hanging is because between opening the connection
> and closing it, Sup does some ridiculously slow message parsing code
> (that really thrashes the CPU) and by the time it's done the server
> has closed the connection (but we don't know about it).

That may all be true, but I'd be surprised if a) Sup didn't know that
the server had closed the connection, and b) that this would cause the
whole thing to hang.

When you don't use -n, does Sup still lock up?
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 21:36   ` William Morgan
@ 2009-06-03 21:48     ` Edward Z. Yang
  2009-06-04  2:11       ` William Morgan
  2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
  1 sibling, 1 reply; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-03 21:48 UTC (permalink / raw)


On Wed, 3 Jun 2009, William Morgan wrote:
> Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
>> The current working theory (based on strace'ing and lsof) is that Sup
>> is hanging on a select() call that doesn't have any timeout.  The
>> reason why this is hanging is because between opening the connection
>> and closing it, Sup does some ridiculously slow message parsing code
>> (that really thrashes the CPU) and by the time it's done the server
>> has closed the connection (but we don't know about it).
>
> That may all be true, but I'd be surprised if a) Sup didn't know that
> the server had closed the connection, and

It doesn't have anything to do with Sup: the system select() call is
hanging; Sup (and Ruby for that matter) is out of the equation. This
might be IMAP server weirdness, but that shouldn't cause Sup to
freeze.

> b) that this would cause the
> whole thing to hang.

That's what I find surprising, since Sup is multithreaded. I suppose
all of the threads are blocking on the select.

Sup hangs without -n. (it always hangs after it's "done fetching IMAP
headers"). "done fetching IMAP headers" is the last message we get.

The behavior seems to be:

1. Log reports "done fetching IMAP headers"

2. Sup begins malloc'ing like crazy. sup-sync still is able to catch
    signals and if you kill it you find that it's somewhere in
    lib/sup/message.rb, message_to_chunks. sup doesn't respond to Ctrl+C

3. After some minutes, CPU usage dies off, but Sup doesn't unfreeze.
    strace indicates that sup/sup-sync is waiting on a select(), and that
    the descriptors are TCP connections to the IMAP server.

Cheers,
Edward



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hangingy
  2009-06-03 21:36   ` William Morgan
  2009-06-03 21:48     ` Edward Z. Yang
@ 2009-06-03 22:00     ` Edward Z. Yang
  2009-06-04  1:26       ` Edward Z. Yang
                         ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-03 22:00 UTC (permalink / raw)


Oh hey, I managed to kill Sup:

ezyang at javelin:~$ sup
[Wed Jun 03 17:42:43 -0400 2009] using character set encoding "UTF-8"
[Wed Jun 03 17:42:44 -0400 2009] dynamically loading setlocale() from
libc.so.6
[Wed Jun 03 17:42:44 -0400 2009] setting locale...
[Wed Jun 03 17:42:44 -0400 2009] locking /home/ezyang/.sup/lock...
[Wed Jun 03 17:42:44 -0400 2009] crypto: detected gpg binary in
/usr/bin/gpg
[Wed Jun 03 17:42:44 -0400 2009] loading index...
[Wed Jun 03 17:42:44 -0400 2009] loaded index of 20188 messages
[Wed Jun 03 17:42:44 -0400 2009] starting curses
[Wed Jun 03 17:54:47 -0400 2009] stopped cursing
[Wed Jun 03 17:54:47 -0400 2009] oh crap, an exception
[Wed Jun 03 17:54:47 -0400 2009] unlocking /home/ezyang/.sup/lock...
----------------------------------------------------------------
I'm very sorry. It seems that an error occurred in Sup. Please
accept my sincere apologies. If you don't mind, please send the
contents of ~/.sup/exception-log.txt and a brief report of the
circumstances to sup-talk at rubyforge dot orgs so that I might
address this problem. Thank you!

Sincerely,
William
----------------------------------------------------------------
--- Interrupt from thread: main

/usr/lib/ruby/gems/1.8/gems/ncurses-0.9.1/lib/ncurses.rb:71:in
`method_missing'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/buffer.rb:109:in `write'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:245:in
`draw_line_from_string'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:182:in
`draw_line'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:48:in
`draw'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:48:in
`each'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:48:in
`draw'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/buffer.rb:99:in `draw'
/usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/buffer.rb:303:in
`draw_screen'
/usr/lib/ruby/gems/1.8/gems/sup-999/bin/sup:308
/var/lib/gems/1.8/bin/sup:19:in `load'
/var/lib/gems/1.8/bin/sup:19

This seems... unrelated.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hangingy
  2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
@ 2009-06-04  1:26       ` Edward Z. Yang
  2009-06-04  1:53         ` [sup-talk] Sup is hangingyy Edward Z. Yang
  2009-06-04  2:12       ` [sup-talk] Sup is hangingy William Morgan
  2009-06-04  2:13       ` William Morgan
  2 siblings, 1 reply; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-04  1:26 UTC (permalink / raw)


Ok, after a bit of fiddling, I've managed to figure out that this is
two problems, not one.

The first is related to a naughty IMAP server which simply stops responding
after a certain number of seconds.  Unfortunately, this is my workplace's
IMAP server, so I can't give you a test environment.  Furthermore, the code
that would be appropriate to change is in Net::Imap, and thus is not
easily accessible by Sup.  I've worked around this by making it an unusual
source.

The second is a large amount of CPU thrashing as Sup parses messages. This
is something we can fix, and I hope to do some more poking to help make
Sup run faster in this respect. The basic behavior is Sup repeatedly
allocates some amount of memory, reallocates it twice (quadrupling it
in size to about 4MB), deallocates, and then does it again. I'm running
ruby-prof in hopes of tickling this again.  Unlike the hung IMAP server,
this eventually finishes, but it is extremely obnoxious when it happens
and blocks everything else. It commonly occurs when I open an email message.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hangingyy
  2009-06-04  1:26       ` Edward Z. Yang
@ 2009-06-04  1:53         ` Edward Z. Yang
  2009-06-04 16:09           ` [sup-talk] Sup is hanging William Morgan
  0 siblings, 1 reply; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-04  1:53 UTC (permalink / raw)


On Wed, 3 Jun 2009, Edward Z. Yang wrote:
> The second is a large amount of CPU thrashing as Sup parses messages. This
> is something we can fix, and I hope to do some more poking to help make
> Sup run faster in this respect. The basic behavior is Sup repeatedly
> allocates some amount of memory, reallocates it twice (quadrupling it
> in size to about 4MB), deallocates, and then does it again. I'm running
> ruby-prof in hopes of tickling this again.  Unlike the hung IMAP server,
> this eventually finishes, but it is extremely obnoxious when it happens
> and blocks everything else. It commonly occurs when I open an email message.

Here we go:

http://web.mit.edu/~ezyang/Public/sup-performance.png

Look at String::=~. Definitely not acceptable.

Cheers,
Edward



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-03 21:48     ` Edward Z. Yang
@ 2009-06-04  2:11       ` William Morgan
  0 siblings, 0 replies; 19+ messages in thread
From: William Morgan @ 2009-06-04  2:11 UTC (permalink / raw)


[resent to list]

Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> That's what I find surprising, since Sup is multithreaded. I suppose
> all of the threads are blocking on the select.

It's possible... Ruby does use green threads and a GIL, so there's
always the possibility that something in C land will block all Ruby
threads, but my impression was that that was an issue mainly with
third-party libraries. I believe that Net::IMAP is pure Ruby, thus using
the core Ruby IO mechanisms, and I thought those were pretty good about
being preemptible. Curious!
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hangingy
  2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
  2009-06-04  1:26       ` Edward Z. Yang
@ 2009-06-04  2:12       ` William Morgan
  2009-06-04  2:13       ` William Morgan
  2 siblings, 0 replies; 19+ messages in thread
From: William Morgan @ 2009-06-04  2:12 UTC (permalink / raw)


Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> Oh hey, I managed to kill Sup:
> 
> ezyang at javelin:~$ sup
> [Wed Jun 03 17:42:43 -0400 2009] using character set encoding "UTF-8"
> [Wed Jun 03 17:42:44 -0400 2009] dynamically loading setlocale() from
> libc.so.6
> [Wed Jun 03 17:42:44 -0400 2009] setting locale...
> [Wed Jun 03 17:42:44 -0400 2009] locking /home/ezyang/.sup/lock...
> [Wed Jun 03 17:42:44 -0400 2009] crypto: detected gpg binary in
> /usr/bin/gpg
> [Wed Jun 03 17:42:44 -0400 2009] loading index...
> [Wed Jun 03 17:42:44 -0400 2009] loaded index of 20188 messages
> [Wed Jun 03 17:42:44 -0400 2009] starting curses
> [Wed Jun 03 17:54:47 -0400 2009] stopped cursing
> [Wed Jun 03 17:54:47 -0400 2009] oh crap, an exception
> [Wed Jun 03 17:54:47 -0400 2009] unlocking /home/ezyang/.sup/lock...
> ----------------------------------------------------------------
> I'm very sorry. It seems that an error occurred in Sup. Please
> accept my sincere apologies. If you don't mind, please send the
> contents of ~/.sup/exception-log.txt and a brief report of the
> circumstances to sup-talk at rubyforge dot orgs so that I might
> address this problem. Thank you!
> 
> Sincerely,
> William
> ----------------------------------------------------------------
> --- Interrupt from thread: main
> 
> /usr/lib/ruby/gems/1.8/gems/ncurses-0.9.1/lib/ncurses.rb:71:in
> `method_missing'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/buffer.rb:109:in `write'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:245:in
> `draw_line_from_string'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:182:in
> `draw_line'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:48:in
> `draw'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:48:in
> `each'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/modes/scroll-mode.rb:48:in
> `draw'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/buffer.rb:99:in `draw'
> /usr/lib/ruby/gems/1.8/gems/sup-999/lib/sup/buffer.rb:303:in
> `draw_screen'
> /usr/lib/ruby/gems/1.8/gems/sup-999/bin/sup:308
> /var/lib/gems/1.8/bin/sup:19:in `load'
> /var/lib/gems/1.8/bin/sup:19
> 
> This seems... unrelated.
> 
> Cheers,
> Edward
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hangingy
  2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
  2009-06-04  1:26       ` Edward Z. Yang
  2009-06-04  2:12       ` [sup-talk] Sup is hangingy William Morgan
@ 2009-06-04  2:13       ` William Morgan
  2 siblings, 0 replies; 19+ messages in thread
From: William Morgan @ 2009-06-04  2:13 UTC (permalink / raw)


[sorry about that last one. really not doing too well with the ol' email
today.]

Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> --- Interrupt from thread: main
> 
> /usr/lib/ruby/gems/1.8/gems/ncurses-0.9.1/lib/ncurses.rb:71:in `method_missing'
[snip]
> 
> This seems... unrelated.

This looks like a ctrl-c that got caught by an ncurses thread. I agree,
seems unrelated.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-04  1:53         ` [sup-talk] Sup is hangingyy Edward Z. Yang
@ 2009-06-04 16:09           ` William Morgan
  2009-06-05  5:08             ` Edward Z. Yang
  0 siblings, 1 reply; 19+ messages in thread
From: William Morgan @ 2009-06-04 16:09 UTC (permalink / raw)


Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> http://web.mit.edu/~ezyang/Public/sup-performance.png
> 
> Look at String::=~. Definitely not acceptable.

Doing some profiling on my end, it looks like the majority of IMAP
syncing time is spent in these five methods:

Redwood::Index#load_entry_for_id (22%)
Redwood::IMAP#load_message (25%)
Redwood::Message#message_to_chunks (16.5%)
Redwood::IMAP#load_header (14%)
Redwood::Index#sync_message (13%)

Four of those are essentially wrappers around IMAP or Ferret methods.
The Sup-specific one is Message#message_to_chunks. But message_to_chunks
and its callee text_to_chunks doesn't seem to have a major culprits.
String#=~ only takes 2.3% of the CPU time, and a chunk of that is coming
from RubyMail.

Just for good measure, I "manually" measured the individual regexen in
text_to_chunks. After parsing 100 messages from an IMAP server, which
took 1m27s for me, I got:

           time (s)   calls
     bqp = 0.00854 /  1789 = 209411.2 calls/second
      n1 = 0.00218 /   313 = 143709.8 calls/second
     qsp = 0.03212 /  1923 =  59873.0 calls/second
      n2 = 0.00202 /   312 = 154226.4 calls/second
   empty = 0.00061 /    90 = 146341.5 calls/second
      sp = 0.02570 /  1927 =  74995.1 calls/second
      qp = 0.03392 /  4459 = 131452.5 calls/second

The names are abbreviations for the various regexen in that method. You
can see that the cumulative time spent on any one regex is at most .034
seconds (qp, which is QUERY_PATTERN), and that the slowest one is qsp,
QUERY_START_PATTERN.

Incidentally, I can parse IMAP mailboxes at a little over 1 message/s,
and mbox files at ~50 messages/second, which also suggests that the IMAP
libraries are the biggest time sink here.

This is all with ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]), Sup
git next.

Now that's all CPU stuff. There might be memory blowup issues. If
nothing else, Sup leaks memory over time, but fixing that involves the
getting into the hellish world of Ruby<->C land or the even more hellish
internals of MRI, so I've been loathe to start down that path.

You might be able to speed up sup-sync runs on IMAP by threading the
network access and the parsing. But the IMAP connection seems pretty
CPU-heavy so who knows.

None of this answers the original question of why all Ruby threads block
when Sup waits for a response from IMAP. Since I'm pretty sure the IMAP
libraries are all Ruby (why they're so slow!), and that core Ruby IO
should be nonblocking, this might be an interpreter bug. Are you able to
pinpoint what part of MRI is blocking?
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-04 16:09           ` [sup-talk] Sup is hanging William Morgan
@ 2009-06-05  5:08             ` Edward Z. Yang
  2009-06-05 13:23               ` William Morgan
       [not found]               ` <1244227108-sup-3123@cabinet>
  0 siblings, 2 replies; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-05  5:08 UTC (permalink / raw)


Hi William,

I tickled the bug and took a more careful look at the message that had
triggered it, and noticed that it was an auto-generated commit message
that was really long.  So one possible stop-gap would be just to detect
if a message is long and disable quoting if that was the case (a long
message would be, say, one over 100KB).  There is also a possibility
that the commit message had some pathological backtracking built into
it.

Excerpts from William Morgan's message of Thu Jun 04 12:09:40 -0400 2009:
> Redwood::Index#load_entry_for_id (22%)
> Redwood::IMAP#load_message (25%)
> Redwood::Message#message_to_chunks (16.5%)
> Redwood::IMAP#load_header (14%)
> Redwood::Index#sync_message (13%)

Ok, so this isn't the same spread as when I managed to make Sup hang,
so this isn't quite the same.

> Now that's all CPU stuff. There might be memory blowup issues. If
> nothing else, Sup leaks memory over time, but fixing that involves the
> getting into the hellish world of Ruby<->C land or the even more hellish
> internals of MRI, so I've been loathe to start down that path.

Sup memory leakage has not been a problem; Sup will hang on a poll
asynchronously (as in, UI is still responsive, but I get no more
new messages) often enough that I have to quit and start sup up again.

> None of this answers the original question of why all Ruby threads block
> when Sup waits for a response from IMAP. Since I'm pretty sure the IMAP
> libraries are all Ruby (why they're so slow!), and that core Ruby IO
> should be nonblocking, this might be an interpreter bug. Are you able to
> pinpoint what part of MRI is blocking?

Not yet. I don't know Ruby threads well enough to know how to get more
fine-grained information on thread mutex/blocking interaction.

To summarize:

1. If I thrash message_to_chunks() with a very long message, I cause
   sup to hang.

   * If this case is only caused by long messages, implement a hueristic
     to prevent quoting from happening to long messages.

   * If there is something specific about the message that causes massive
     back-tracking, fix the regex.

   In either case, message_to_chunks() should not block the rest of the
   interface (although if we fix either or the two sub-points, this
   may not be a noticeable problem).

2. If an IMAP connection hangs, it occasionally causes all of Sup to
   block (this is rare, and comes from a pathological IMAP server. I
   think the ops administering the naughty IMAP server fixed it, so
   I am no longer seeing this hang).

3. Under less pathological cases, an IMAP connection can hang, and
   asynchronously blocks any further polling from taking place, resulting
   in no new messages.  This happens commonly for me.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-05  5:08             ` Edward Z. Yang
@ 2009-06-05 13:23               ` William Morgan
       [not found]               ` <1244227108-sup-3123@cabinet>
  1 sibling, 0 replies; 19+ messages in thread
From: William Morgan @ 2009-06-05 13:23 UTC (permalink / raw)


Reformatted excerpts from Edward Z. Yang's message of 2009-06-04:
> I tickled the bug and took a more careful look at the message that had
> triggered it

This is good stuff. I think we're getting somewhere. I was hoping to
have this this fixed by 0.8 but I think it's going to have to wait till
0.9.

> and noticed that it was an auto-generated commit message that was
> really long. So one possible stop-gap would be just to detect if a
> message is long and disable quoting if that was the case (a long
> message would be, say, one over 100KB).

Yep, I bet that's the problem.

Not parsing long messages would be fine as far as I'm concerned. Heck
even 25k might be a reasonable limit (per MIME part).

> There is also a possibility that the commit message had some
> pathological backtracking built into it.

It's possible... but I'm willing to bet that it's the sheer size. If
we're still seeing this performance on smaller messages (e.g.  when you
start sending me emails carefully constructed to induce worst-case
performance in Sup's regexen) we can wrap a timeout around the whole
parse as well.

> Excerpts from William Morgan's message of Thu Jun 04 12:09:40 -0400 2009:
> > Redwood::Index#load_entry_for_id (22%)
> > Redwood::IMAP#load_message (25%)
> > Redwood::Message#message_to_chunks (16.5%)
> > Redwood::IMAP#load_header (14%)
> > Redwood::Index#sync_message (13%)
> 
> Ok, so this isn't the same spread as when I managed to make Sup hang,
> so this isn't quite the same.

That makes sense.

> 1. If I thrash message_to_chunks() with a very long message, I cause
> sup to hang.

This is still weird. It only makes sense if Ruby regexen block the whole
interpreter when evaluating. I guess that might be the case if they're
they haven't taken the (what I imagine must be trivial) step of allowing
preemption during the DFA execution (or however the hell they're
implemented). I guess an experiment would show this.

> 2. If an IMAP connection hangs, it occasionally causes all of Sup to
>    block (this is rare, and comes from a pathological IMAP server. I
>    think the ops administering the naughty IMAP server fixed it, so
>    I am no longer seeing this hang).

This is also still weird, and I wonder if it's not just problems 1 and 3
combining. If it comes back we will have to do some more investigation.

> 3. Under less pathological cases, an IMAP connection can hang, and
> asynchronously blocks any further polling from taking place, resulting
> in no new messages.  This happens commonly for me.

This one we can fix. (Though it's something the IMAP libraries should've
done for us.) If we put a timeout block around that #examine call, we
should be able to reset the connection if it hangs.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
       [not found]               ` <1244227108-sup-3123@cabinet>
@ 2009-06-05 21:47                 ` Edward Z. Yang
  2009-06-06  6:20                   ` Edward Z. Yang
  0 siblings, 1 reply; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-05 21:47 UTC (permalink / raw)


Excerpts from Marc Hartstein's message of Fri Jun 05 14:43:59 -0400 2009:
> I noticed that one feature of the messages was that they contained one
> extremely long line.  Not sure if this information will in any way help
> figure out what's going on there.

Now that you mention it, the messages that tickle this bug on my side also
have one extremely long line.  That's very interesting.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-05 21:47                 ` Edward Z. Yang
@ 2009-06-06  6:20                   ` Edward Z. Yang
  2009-06-08 18:09                     ` William Morgan
  0 siblings, 1 reply; 19+ messages in thread
From: Edward Z. Yang @ 2009-06-06  6:20 UTC (permalink / raw)


Excerpts from Edward Z. Yang's message of Fri Jun 05 17:47:00 -0400 2009:
> Now that you mention it, the messages that tickle this bug on my side also
> have one extremely long line.  That's very interesting.

Here is the culprit, laid out to bear its full shame:

    /\w.*:$/

I thought this was a suspicious looking regexen; a simple test confirmed my
belief:

    line = ":a" * 10000
    line =~ /\w.*:$/

Ba boom ba boom ba boom.  This is a textbook case of catastrophic backtracking.

I have two possible fixes, they end up being about the same time for regular
cases, but the second one is more optimal for really long strings:

First, the simple one:

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..0ddd3af 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,7 @@ class Message
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
   BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
-  QUOTE_START_PATTERN = /\w.*:$/
+  QUOTE_START_PATTERN = /\w\W*:$/
   SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|(^\s*--\+\+\*\*==)/
 
   MAX_SIG_DISTANCE = 15 # lines from the end

And the slightly more complicated one (but optimal for large n):

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..c5481a6 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,6 @@ class Message
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
   BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
-  QUOTE_START_PATTERN = /\w.*:$/
   SIG_PATTERN = /(^-- ?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|
 
   MAX_SIG_DISTANCE = 15 # lines from the end
@@ -449,7 +448,7 @@ private
       when :text
         newstate = nil
 
-        if line =~ QUOTE_PATTERN || (line =~ QUOTE_START_PATTERN && nextline =~ QUO
+        if line =~ QUOTE_PATTERN || (line =~ /:$/ && line =~ /\w/ && nextline =~ QU
           newstate = :quote
         elsif line =~ SIG_PATTERN && (lines.length - i) < MAX_SIG_DISTANCE
           newstate = :sig

There are number of micro-optimizations that could be made to message
parsing, but this will basically fix the egregious problem.

Cheers,
Edward


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [sup-talk] Sup is hanging
  2009-06-06  6:20                   ` Edward Z. Yang
@ 2009-06-08 18:09                     ` William Morgan
  0 siblings, 0 replies; 19+ messages in thread
From: William Morgan @ 2009-06-08 18:09 UTC (permalink / raw)


Reformatted excerpts from Edward Z. Yang's message of 2009-06-05:
> -        if line =~ QUOTE_PATTERN || (line =~ QUOTE_START_PATTERN && nextline  =~ QUO
> +        if line =~ QUOTE_PATTERN || (line =~ /:$/ && line =~ /\w/ && nextline =~ QU

Seems fine to me. I've applied that change directly to master. Not only
are there millions of optimiziations for this stuff, there are millions
of tweaks to improve detection precision and recall.

Thanks for your help with this, Edward!
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-06-08 18:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-03 17:39 [sup-talk] Sup is hanging Edward Z. Yang
2009-06-03 18:11 ` William Morgan
2009-06-03 18:26   ` Edward Z. Yang
2009-06-03 18:21 ` Edward Z. Yang
2009-06-03 18:45   ` Edward Z. Yang
2009-06-03 21:36   ` William Morgan
2009-06-03 21:48     ` Edward Z. Yang
2009-06-04  2:11       ` William Morgan
2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
2009-06-04  1:26       ` Edward Z. Yang
2009-06-04  1:53         ` [sup-talk] Sup is hangingyy Edward Z. Yang
2009-06-04 16:09           ` [sup-talk] Sup is hanging William Morgan
2009-06-05  5:08             ` Edward Z. Yang
2009-06-05 13:23               ` William Morgan
     [not found]               ` <1244227108-sup-3123@cabinet>
2009-06-05 21:47                 ` Edward Z. Yang
2009-06-06  6:20                   ` Edward Z. Yang
2009-06-08 18:09                     ` William Morgan
2009-06-04  2:12       ` [sup-talk] Sup is hangingy William Morgan
2009-06-04  2:13       ` William Morgan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox