Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
* [sup-talk] Using DTrace to profile sup-sync
@ 2007-12-15 15:05 Jjgod Jiang
  2007-12-15 19:25 ` William Morgan
  2007-12-15 21:04 ` William Morgan
  0 siblings, 2 replies; 5+ messages in thread
From: Jjgod Jiang @ 2007-12-15 15:05 UTC (permalink / raw)


Hi,

When I was trying to build index for my GMail account (which has 28525
mails in inbox)
with sup-sync, I found the estimated finish time is too long (24
hours) to accept. So I
decided to use DTrace to do some profiling work on sup-sync, and here
is the result.

The top n method spent most of the time is:

___ OVERLAP TIMES: ___                                    ______ ELAPSED ______
CLASS                    METHOD                   COUNT    AVG(us)      SUM(us)

Net::IMAP::ResponseParse next_token              570993        282    161345198
Net::IMAP::ResponseParse lookahead               770871        283    218826568
Class                    _parse                   28612       8051    230376494
Class                    parse                    28902       9105    263159874
Net::IMAP::ResponseParse match                   485344        563    273536106
Redwood::IMAP            make_id                  28534       9985    284928748
Net::IMAP::ResponseParse msg_att                  28534      11423    325954315
Array                    each                     86883       3819    331861324
Net::IMAP::ResponseParse numeric_response         28538      15239    434895984
Net::IMAP::ResponseParse response_untagged        28551      17003    485460410
Net::IMAP::ResponseParse response                 28566      19149    547032919
Net::IMAP::ResponseParse parse                    28566      19233    549418654
Net::IMAP                get_response             28551      19922    568811230

I can send the complete result if needed. HTH.

BTW: it only took mutt half an hour to fetch all the mail headers.

- Jiang


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] Using DTrace to profile sup-sync
  2007-12-15 15:05 [sup-talk] Using DTrace to profile sup-sync Jjgod Jiang
@ 2007-12-15 19:25 ` William Morgan
  2007-12-15 19:38   ` William Morgan
  2007-12-15 21:04 ` William Morgan
  1 sibling, 1 reply; 5+ messages in thread
From: William Morgan @ 2007-12-15 19:25 UTC (permalink / raw)


Excerpts from Jjgod Jiang's message of Sat Dec 15 07:05:04 -0800 2007:
> When I was trying to build index for my GMail account (which has 28525
> mails in inbox) with sup-sync, I found the estimated finish time is
> too long (24 hours) to accept. So I decided to use DTrace to do some
> profiling work on sup-sync, and here is the result.

Thanks for the analysis! I am very interested in speeding up this sort
of thing. However, it looks like the culprit here is the Ruby IMAP
library, which presumably is so much slower than Mutt because it's Ruby.

So probably the only significant way to speed this up would be to use a
different IMAP library. If there's a C IMAP library with a Ruby bridge,
I'll consider hooking it in, but I don't know of one...

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] Using DTrace to profile sup-sync
  2007-12-15 19:25 ` William Morgan
@ 2007-12-15 19:38   ` William Morgan
  0 siblings, 0 replies; 5+ messages in thread
From: William Morgan @ 2007-12-15 19:38 UTC (permalink / raw)


Oh, the other difference between what sup-sync does and what Mutt does
is that sup-sync downloads the entire content of the message, parses it
for text regions, and stores those in the Ferret index. That's
significantly more work than just downloading the headers, though I
don't know how much that accounts for the difference in speed.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] Using DTrace to profile sup-sync
  2007-12-15 15:05 [sup-talk] Using DTrace to profile sup-sync Jjgod Jiang
  2007-12-15 19:25 ` William Morgan
@ 2007-12-15 21:04 ` William Morgan
  2007-12-19  4:38   ` William Morgan
  1 sibling, 1 reply; 5+ messages in thread
From: William Morgan @ 2007-12-15 21:04 UTC (permalink / raw)


Excerpts from Jjgod Jiang's message of Sat Dec 15 07:05:04 -0800 2007:
> When I was trying to build index for my GMail account (which has 28525
> mails in inbox) with sup-sync, I found the estimated finish time is
> too long (24 hours) to accept.

Actually, large IMAP folders will always present a significant problem
for Sup, because IMAP sucks balls. Short story: Sup has to *always*
download the headers of the entire mailbox in order to function
correctly. So every time you start up Sup you're going to be sitting
there, waiting for that to happen, before you can do anything with your
IMAP messages.

Long story: This is because IMAP provides no way of getting a
consistent, cross-session identifier for a single message, which is
what Sup needs to retrieve message content when you view a thread. You
can read more about why IMAP's UID is useless.

(Actually, what Sup really needs is a unique, cross-session identifier
for messages, which increments as messages get newer, so that it can
tell when new messages have appeared in the inbox. In case you were
wondering, the IMAP \Recent flag is useless for this.)

So, to get around this, Sup has to construct its own unique message id,
based on the size and the internal date of the message. It then maps
this internal id into the server's message id, and can finally both
retrieve any message in the index from the server, and can check for new
messages since the last time it connected to the server.

So, if you really have a large IMAP account that you want to index with
Sup, the solution is to convert it to mbox. That will solve all your
problems. Don't blame Sup, blame IMAP.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] Using DTrace to profile sup-sync
  2007-12-15 21:04 ` William Morgan
@ 2007-12-19  4:38   ` William Morgan
  0 siblings, 0 replies; 5+ messages in thread
From: William Morgan @ 2007-12-19  4:38 UTC (permalink / raw)


Excerpts from William Morgan's message of Sat Dec 15 13:04:10 -0800 2007:
> Actually, large IMAP folders will always present a significant problem
> for Sup, because IMAP sucks balls. Short story: Sup has to *always*
> download the headers of the entire mailbox in order to function
> correctly. So every time you start up Sup you're going to be sitting
> there, waiting for that to happen, before you can do anything with
> your IMAP messages.

It might be possible for this not to be the case for well-behaved IMAP
clients, i.e. those that don't change UIDVALIDITY often. If I store the
UIDs directly, then no header pull would be necessary, but a change in
UIDVALIDITY would require a full rescan a la sup-sync --changed.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-19  4:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-15 15:05 [sup-talk] Using DTrace to profile sup-sync Jjgod Jiang
2007-12-15 19:25 ` William Morgan
2007-12-15 19:38   ` William Morgan
2007-12-15 21:04 ` William Morgan
2007-12-19  4:38   ` William Morgan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox