Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: wmorgan-sup@masanjin.net (William Morgan)
Subject: [sup-talk] Sup is hanging
Date: Thu, 04 Jun 2009 09:09:40 -0700	[thread overview]
Message-ID: <1244130722-sup-4496@entry> (raw)
In-Reply-To: <alpine.DEB.2.00.0906032152200.6894@javelin>

Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> http://web.mit.edu/~ezyang/Public/sup-performance.png
> 
> Look at String::=~. Definitely not acceptable.

Doing some profiling on my end, it looks like the majority of IMAP
syncing time is spent in these five methods:

Redwood::Index#load_entry_for_id (22%)
Redwood::IMAP#load_message (25%)
Redwood::Message#message_to_chunks (16.5%)
Redwood::IMAP#load_header (14%)
Redwood::Index#sync_message (13%)

Four of those are essentially wrappers around IMAP or Ferret methods.
The Sup-specific one is Message#message_to_chunks. But message_to_chunks
and its callee text_to_chunks doesn't seem to have a major culprits.
String#=~ only takes 2.3% of the CPU time, and a chunk of that is coming
from RubyMail.

Just for good measure, I "manually" measured the individual regexen in
text_to_chunks. After parsing 100 messages from an IMAP server, which
took 1m27s for me, I got:

           time (s)   calls
     bqp = 0.00854 /  1789 = 209411.2 calls/second
      n1 = 0.00218 /   313 = 143709.8 calls/second
     qsp = 0.03212 /  1923 =  59873.0 calls/second
      n2 = 0.00202 /   312 = 154226.4 calls/second
   empty = 0.00061 /    90 = 146341.5 calls/second
      sp = 0.02570 /  1927 =  74995.1 calls/second
      qp = 0.03392 /  4459 = 131452.5 calls/second

The names are abbreviations for the various regexen in that method. You
can see that the cumulative time spent on any one regex is at most .034
seconds (qp, which is QUERY_PATTERN), and that the slowest one is qsp,
QUERY_START_PATTERN.

Incidentally, I can parse IMAP mailboxes at a little over 1 message/s,
and mbox files at ~50 messages/second, which also suggests that the IMAP
libraries are the biggest time sink here.

This is all with ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]), Sup
git next.

Now that's all CPU stuff. There might be memory blowup issues. If
nothing else, Sup leaks memory over time, but fixing that involves the
getting into the hellish world of Ruby<->C land or the even more hellish
internals of MRI, so I've been loathe to start down that path.

You might be able to speed up sup-sync runs on IMAP by threading the
network access and the parsing. But the IMAP connection seems pretty
CPU-heavy so who knows.

None of this answers the original question of why all Ruby threads block
when Sup waits for a response from IMAP. Since I'm pretty sure the IMAP
libraries are all Ruby (why they're so slow!), and that core Ruby IO
should be nonblocking, this might be an interpreter bug. Are you able to
pinpoint what part of MRI is blocking?
-- 
William <wmorgan-sup at masanjin.net>


  reply	other threads:[~2009-06-04 16:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-03 17:39 Edward Z. Yang
2009-06-03 18:11 ` William Morgan
2009-06-03 18:26   ` Edward Z. Yang
2009-06-03 18:21 ` Edward Z. Yang
2009-06-03 18:45   ` Edward Z. Yang
2009-06-03 21:36   ` William Morgan
2009-06-03 21:48     ` Edward Z. Yang
2009-06-04  2:11       ` William Morgan
2009-06-03 22:00     ` [sup-talk] Sup is hangingy Edward Z. Yang
2009-06-04  1:26       ` Edward Z. Yang
2009-06-04  1:53         ` [sup-talk] Sup is hangingyy Edward Z. Yang
2009-06-04 16:09           ` William Morgan [this message]
2009-06-05  5:08             ` [sup-talk] Sup is hanging Edward Z. Yang
2009-06-05 13:23               ` William Morgan
     [not found]               ` <1244227108-sup-3123@cabinet>
2009-06-05 21:47                 ` Edward Z. Yang
2009-06-06  6:20                   ` Edward Z. Yang
2009-06-08 18:09                     ` William Morgan
2009-06-04  2:12       ` [sup-talk] Sup is hangingy William Morgan
2009-06-04  2:13       ` William Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1244130722-sup-4496@entry \
    --to=wmorgan-sup@masanjin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox