* [sup-talk] Using DTrace to profile sup-sync @ 2007-12-15 15:05 Jjgod Jiang 2007-12-15 19:25 ` William Morgan 2007-12-15 21:04 ` William Morgan 0 siblings, 2 replies; 5+ messages in thread From: Jjgod Jiang @ 2007-12-15 15:05 UTC (permalink / raw) Hi, When I was trying to build index for my GMail account (which has 28525 mails in inbox) with sup-sync, I found the estimated finish time is too long (24 hours) to accept. So I decided to use DTrace to do some profiling work on sup-sync, and here is the result. The top n method spent most of the time is: ___ OVERLAP TIMES: ___ ______ ELAPSED ______ CLASS METHOD COUNT AVG(us) SUM(us) Net::IMAP::ResponseParse next_token 570993 282 161345198 Net::IMAP::ResponseParse lookahead 770871 283 218826568 Class _parse 28612 8051 230376494 Class parse 28902 9105 263159874 Net::IMAP::ResponseParse match 485344 563 273536106 Redwood::IMAP make_id 28534 9985 284928748 Net::IMAP::ResponseParse msg_att 28534 11423 325954315 Array each 86883 3819 331861324 Net::IMAP::ResponseParse numeric_response 28538 15239 434895984 Net::IMAP::ResponseParse response_untagged 28551 17003 485460410 Net::IMAP::ResponseParse response 28566 19149 547032919 Net::IMAP::ResponseParse parse 28566 19233 549418654 Net::IMAP get_response 28551 19922 568811230 I can send the complete result if needed. HTH. BTW: it only took mutt half an hour to fetch all the mail headers. - Jiang ^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] Using DTrace to profile sup-sync 2007-12-15 15:05 [sup-talk] Using DTrace to profile sup-sync Jjgod Jiang @ 2007-12-15 19:25 ` William Morgan 2007-12-15 19:38 ` William Morgan 2007-12-15 21:04 ` William Morgan 1 sibling, 1 reply; 5+ messages in thread From: William Morgan @ 2007-12-15 19:25 UTC (permalink / raw) Excerpts from Jjgod Jiang's message of Sat Dec 15 07:05:04 -0800 2007: > When I was trying to build index for my GMail account (which has 28525 > mails in inbox) with sup-sync, I found the estimated finish time is > too long (24 hours) to accept. So I decided to use DTrace to do some > profiling work on sup-sync, and here is the result. Thanks for the analysis! I am very interested in speeding up this sort of thing. However, it looks like the culprit here is the Ruby IMAP library, which presumably is so much slower than Mutt because it's Ruby. So probably the only significant way to speed this up would be to use a different IMAP library. If there's a C IMAP library with a Ruby bridge, I'll consider hooking it in, but I don't know of one... -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] Using DTrace to profile sup-sync 2007-12-15 19:25 ` William Morgan @ 2007-12-15 19:38 ` William Morgan 0 siblings, 0 replies; 5+ messages in thread From: William Morgan @ 2007-12-15 19:38 UTC (permalink / raw) Oh, the other difference between what sup-sync does and what Mutt does is that sup-sync downloads the entire content of the message, parses it for text regions, and stores those in the Ferret index. That's significantly more work than just downloading the headers, though I don't know how much that accounts for the difference in speed. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] Using DTrace to profile sup-sync 2007-12-15 15:05 [sup-talk] Using DTrace to profile sup-sync Jjgod Jiang 2007-12-15 19:25 ` William Morgan @ 2007-12-15 21:04 ` William Morgan 2007-12-19 4:38 ` William Morgan 1 sibling, 1 reply; 5+ messages in thread From: William Morgan @ 2007-12-15 21:04 UTC (permalink / raw) Excerpts from Jjgod Jiang's message of Sat Dec 15 07:05:04 -0800 2007: > When I was trying to build index for my GMail account (which has 28525 > mails in inbox) with sup-sync, I found the estimated finish time is > too long (24 hours) to accept. Actually, large IMAP folders will always present a significant problem for Sup, because IMAP sucks balls. Short story: Sup has to *always* download the headers of the entire mailbox in order to function correctly. So every time you start up Sup you're going to be sitting there, waiting for that to happen, before you can do anything with your IMAP messages. Long story: This is because IMAP provides no way of getting a consistent, cross-session identifier for a single message, which is what Sup needs to retrieve message content when you view a thread. You can read more about why IMAP's UID is useless. (Actually, what Sup really needs is a unique, cross-session identifier for messages, which increments as messages get newer, so that it can tell when new messages have appeared in the inbox. In case you were wondering, the IMAP \Recent flag is useless for this.) So, to get around this, Sup has to construct its own unique message id, based on the size and the internal date of the message. It then maps this internal id into the server's message id, and can finally both retrieve any message in the index from the server, and can check for new messages since the last time it connected to the server. So, if you really have a large IMAP account that you want to index with Sup, the solution is to convert it to mbox. That will solve all your problems. Don't blame Sup, blame IMAP. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-talk] Using DTrace to profile sup-sync 2007-12-15 21:04 ` William Morgan @ 2007-12-19 4:38 ` William Morgan 0 siblings, 0 replies; 5+ messages in thread From: William Morgan @ 2007-12-19 4:38 UTC (permalink / raw) Excerpts from William Morgan's message of Sat Dec 15 13:04:10 -0800 2007: > Actually, large IMAP folders will always present a significant problem > for Sup, because IMAP sucks balls. Short story: Sup has to *always* > download the headers of the entire mailbox in order to function > correctly. So every time you start up Sup you're going to be sitting > there, waiting for that to happen, before you can do anything with > your IMAP messages. It might be possible for this not to be the case for well-behaved IMAP clients, i.e. those that don't change UIDVALIDITY often. If I store the UIDs directly, then no header pull would be necessary, but a change in UIDVALIDITY would require a full rescan a la sup-sync --changed. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-12-19 4:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-12-15 15:05 [sup-talk] Using DTrace to profile sup-sync Jjgod Jiang 2007-12-15 19:25 ` William Morgan 2007-12-15 19:38 ` William Morgan 2007-12-15 21:04 ` William Morgan 2007-12-19 4:38 ` William Morgan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox