* [sup-talk] header cache @ 2008-01-15 12:54 Giorgio Lando 2008-01-16 3:24 ` William Morgan 0 siblings, 1 reply; 15+ messages in thread From: Giorgio Lando @ 2008-01-15 12:54 UTC (permalink / raw) Hi, after ten days of successful use of sup. I would like to say that it is very useful in many circumstances (I have tried maildir, mbox, IMAP folders, and a mix of them). When using it with gmail imap, the only thing I lack is the header cache. In fact, I access directly the "All mail" folder (since anything else in gmail is actually a duplicate of the mails in All mail; moreover in this way any change by sup does not interfere with the archiviation from the gmail web interface, and the archiviation from the web interface does not prevent sup from seeing the message). My "All mail" includes approximately 40000 messages so the fetching of IMAP headers is highly time-consuming (also the initial sup-sync took about 8 hours, but this happens only one time). I see that in TODO the header cache is scheduled for the 0.4 release. So am I correct to expect it quite soon? Thanks in advance for any info Giorgio Lando ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-15 12:54 [sup-talk] header cache Giorgio Lando @ 2008-01-16 3:24 ` William Morgan 2008-01-16 20:26 ` Giorgio Lando 0 siblings, 1 reply; 15+ messages in thread From: William Morgan @ 2008-01-16 3:24 UTC (permalink / raw) Reformatted excerpts from Giorgio Lando's message of 2008-01-15: > My "All mail" includes approximately 40000 messages so the fetching of > IMAP headers is highly time-consuming (also the initial sup-sync took > about 8 hours, but this happens only one time). > > I see that in TODO the header cache is scheduled for the 0.4 release. > So am I correct to expect it quite soon? Yes, I have a stub implementation in the works that should be ready for testing soon. I'll focus more attention on it now that I have a customer. :) -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-16 3:24 ` William Morgan @ 2008-01-16 20:26 ` Giorgio Lando 2008-01-16 20:48 ` William Morgan 0 siblings, 1 reply; 15+ messages in thread From: Giorgio Lando @ 2008-01-16 20:26 UTC (permalink / raw) > Yes, I have a stub implementation in the works that should be ready for > testing soon. I'll focus more attention on it now that I have a > customer. :) Fine, ready for eventual testing, although I have finally decided to sync locally the 40000 messages with another tool, so to have them indexed and opened by sup in a faster way. Anyway I guess that a cache helps only relatively with such a monster (or at least also mutt's header cache helps only a bit with 40000 messages in an IMAP folder), so I will be able to make tests with other, much smaller imap folders. Giorgio ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-16 20:26 ` Giorgio Lando @ 2008-01-16 20:48 ` William Morgan 2008-01-16 22:04 ` Giorgio Lando 0 siblings, 1 reply; 15+ messages in thread From: William Morgan @ 2008-01-16 20:48 UTC (permalink / raw) Reformatted excerpts from Giorgio Lando's message of 2008-01-16: > Anyway I guess that a cache helps only relatively with such a monster > (or at least also mutt's header cache helps only a bit with 40000 > messages in an IMAP folder), so I will be able to make tests with > other, much smaller imap folders. If you look at the log messages, there's one at the emitted at the beginning of loading the headers, and one at the end, so you can see exactly how much time adding a cache will save you (minus however long it takes to load things in from disk). I'm curious what this number is for 40000 headers. It seems to be about 10 seconds for my 5000-message IMAP folder, so I can imagine it being quite significant. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-16 20:48 ` William Morgan @ 2008-01-16 22:04 ` Giorgio Lando 2008-01-16 22:58 ` Alec Berryman 2008-01-17 8:13 ` Nicolas Pouillard 0 siblings, 2 replies; 15+ messages in thread From: Giorgio Lando @ 2008-01-16 22:04 UTC (permalink / raw) > If you look at the log messages, there's one at the emitted at the > beginning of loading the headers, and one at the end, so you can see > exactly how much time adding a cache will save you (minus however long > it takes to load things in from disk). > > I'm curious what this number is for 40000 headers. It seems to be about > 10 seconds for my 5000-message IMAP folder, so I can imagine it being > quite significant. I guess you are referring to mutt: in this case I can confirm that the cache (mutt is compiled with gdbm) saves about 80 secs. What I meant is that it is nonetheless very slow, and that this is why one SHOULD NOT ordinarily access an IMAP folder with 40000 headers. In fact I do not do this in mutt, where on the contrary I enter INBOX and change folder only when needed (so only seldom I change to All Mail). I can not do exactly the same with sup: either I pull mails only from the INBOX, but then everything already archived in gmail through the web interface will not be seen by sup, or I go with All Mail, but then I have to load each time 40000 headers, which is extra slow (and would be very slow also with an header cache). Finally, there would not be any point to pull mails from all the other folders/labels, because they would include only duplicates of mails in All Mail and they would not enrich sup index at all. I have also the feeling that sup's philosophy is not very consistent with IMAP's philosophy, which is to access the same structure of folders from different interfaces, syncing immediately any operation on the server, so that folders and labels are exactly the same from any interface. Sup follows its own philosophy: thus, it does not deal with folders at all, and (as you admit often) does not aim to play well with other ways to check emails. Gmail's imap is not an exception to this conflict between sup's philosophy and imap's philosophy, because gmail's labels are, when you access gmail through IMAP, nothing else than folders! So I have finally decided to go back to POP3, download all mails (including mails sent from the web interface, in the case of gmail) locally and let sup index and open them locally. The mails are anyway kept on the server, so I can also access them through a traditional IMAP client with folders (such as mutt) or through the web interface when I am around the world without my laptop. In this way, I can use sup in a fast way without any bad interplay with other ways of checking emails. This is very important for me because I travel a lot and it is not possible for me to access emails exclusively from my own computers with sup installed :) . Thus I do not think that I will go back to access gmail through IMAP with sup. Anyway I hope to be able to do some testing of the IMAP headers cache with other small IMAP servers I have access to. Giorgio ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-16 22:04 ` Giorgio Lando @ 2008-01-16 22:58 ` Alec Berryman 2008-01-16 23:36 ` Giorgio Lando 2008-01-17 8:13 ` Nicolas Pouillard 1 sibling, 1 reply; 15+ messages in thread From: Alec Berryman @ 2008-01-16 22:58 UTC (permalink / raw) Giorgio Lando on 2008-01-16 23:04:26 +0100: > So I have finally decided to go back to POP3, download all mails (including > mails sent from the web interface, in the case of gmail) locally and let sup > index and open them locally. The mails are anyway kept on the server, so > I can also access them through a traditional IMAP client with folders > (such as mutt) or through the web interface when I am around the world > without my laptop. Have you looked at offlineimap? I started using it with mutt because IMAP, even with header cachine, was just way too slow. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-16 22:58 ` Alec Berryman @ 2008-01-16 23:36 ` Giorgio Lando 0 siblings, 0 replies; 15+ messages in thread From: Giorgio Lando @ 2008-01-16 23:36 UTC (permalink / raw) Excerpts from Alec Berryman's message of Wed Jan 16 23:58:17 +0100 2008: > Have you looked at offlineimap? I started using it with mutt because > IMAP, even with header cachine, was just way too slow. Yes, I have used it and isync: you are right,they solve the problem of the slowness, since all the syncing happens in the background when using mutt or sup. But still I think that, when one does not care about folders and about synchronization of different interfaces, the intrinsic advantages of IMAP over POP3 tend to disappear. On the contrary there is the risk of strange interplay (such as sup changing the flags in way that offlineimap syncs to an IMAP server which does not understand or does strange things); a solution is to use isync as a push-only synchronizer (only sync the changes from the server to the local machine and not the opposite), but really in this way you loose all the advantages of IMAP and I prefer to use well-tested and powerful fetchers for POP3, such as fdm and getmail. Obviously if on the contrary for some reason you are forced to use IMAP, then offlineimap and isync are the best solution Giorgio ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-16 22:04 ` Giorgio Lando 2008-01-16 22:58 ` Alec Berryman @ 2008-01-17 8:13 ` Nicolas Pouillard 2008-01-17 11:40 ` Giorgio Lando 1 sibling, 1 reply; 15+ messages in thread From: Nicolas Pouillard @ 2008-01-17 8:13 UTC (permalink / raw) Excerpts from Giorgio Lando's message of Wed Jan 16 23:04:26 +0100 2008: > [...] > So I have finally decided to go back to POP3, download all mails (including > mails sent from the web interface, in the case of gmail) locally and let sup > index and open them locally. The mails are anyway kept on the server, so > I can also access them through a traditional IMAP client with folders > (such as mutt) or through the web interface when I am around the world > without my laptop. I use this setup too, using mbox and it's quite fast. -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-17 8:13 ` Nicolas Pouillard @ 2008-01-17 11:40 ` Giorgio Lando 2008-01-17 14:56 ` Nicolas Pouillard 0 siblings, 1 reply; 15+ messages in thread From: Giorgio Lando @ 2008-01-17 11:40 UTC (permalink / raw) Excerpts from nicolas.pouillard's message of Thu Jan 17 09:13:21 +0100 2008: > Excerpts from Giorgio Lando's message of Wed Jan 16 23:04:26 +0100 2008: > > > > [...] > > > So I have finally decided to go back to POP3, download all mails (including > > mails sent from the web interface, in the case of gmail) locally and let sup > > index and open them locally. The mails are anyway kept on the server, so > > I can also access them through a traditional IMAP client with folders > > (such as mutt) or through the web interface when I am around the world > > without my laptop. > > I use this setup too, using mbox and it's quite fast. Yes, it is fast also with maildir! Quite satisfied Giorgio -- Giorgio Lando <patroclo7 at gmail dot com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-17 11:40 ` Giorgio Lando @ 2008-01-17 14:56 ` Nicolas Pouillard 2008-01-22 2:36 ` William Morgan 0 siblings, 1 reply; 15+ messages in thread From: Nicolas Pouillard @ 2008-01-17 14:56 UTC (permalink / raw) Excerpts from Giorgio Lando's message of Thu Jan 17 12:40:47 +0100 2008: > Excerpts from nicolas.pouillard's message of Thu Jan 17 09:13:21 +0100 2008: > > Excerpts from Giorgio Lando's message of Wed Jan 16 23:04:26 +0100 2008: > > > > > > > [...] > > > > > So I have finally decided to go back to POP3, download all mails (including > > > mails sent from the web interface, in the case of gmail) locally and let sup > > > index and open them locally. The mails are anyway kept on the server, so > > > I can also access them through a traditional IMAP client with folders > > > (such as mutt) or through the web interface when I am around the world > > > without my laptop. > > > > I use this setup too, using mbox and it's quite fast. > > Yes, it is fast also with maildir! Quite satisfied I've also tried maildir for some time, and found it slower. That's because Sup needs to compute a hash from file names to message ids. -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-17 14:56 ` Nicolas Pouillard @ 2008-01-22 2:36 ` William Morgan 2008-01-22 8:40 ` Nicolas Pouillard 0 siblings, 1 reply; 15+ messages in thread From: William Morgan @ 2008-01-22 2:36 UTC (permalink / raw) Reformatted excerpts from nicolas.pouillard's message of 2008-01-17: > That's because Sup needs to compute a hash from file names to message > ids. Like IMAP headers, this is a one-time cost upon startup and could easily be cached. Maildir shouldn't be significantly slower than mbox except for this. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-22 2:36 ` William Morgan @ 2008-01-22 8:40 ` Nicolas Pouillard 2008-01-23 20:33 ` Gabriel Sean Farrell [not found] ` <1201071069-sup-6394@south> 0 siblings, 2 replies; 15+ messages in thread From: Nicolas Pouillard @ 2008-01-22 8:40 UTC (permalink / raw) Excerpts from William Morgan's message of Tue Jan 22 03:36:48 +0100 2008: > Reformatted excerpts from nicolas.pouillard's message of 2008-01-17: > > That's because Sup needs to compute a hash from file names to message > > ids. > > Like IMAP headers, this is a one-time cost upon startup and could easily > be cached. Maildir shouldn't be significantly slower than mbox except > for this. Looking at the code 'scan_mailbox' seems to be called quite often (but not more than every 30 seconds). I'm wondering if peeking a file in a very large directory is as fast as seeking to a particular offset in a large file? -- Nicolas Pouillard aka Ertai -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 186 bytes Desc: not available Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080122/41112479/attachment.bin ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-22 8:40 ` Nicolas Pouillard @ 2008-01-23 20:33 ` Gabriel Sean Farrell [not found] ` <1201071069-sup-6394@south> 1 sibling, 0 replies; 15+ messages in thread From: Gabriel Sean Farrell @ 2008-01-23 20:33 UTC (permalink / raw) On Tue, Jan 22, 2008 at 09:40:05AM +0100, Nicolas Pouillard wrote: > Excerpts from William Morgan's message of Tue Jan 22 03:36:48 +0100 2008: > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-17: > > > That's because Sup needs to compute a hash from file names to message > > > ids. > > > > Like IMAP headers, this is a one-time cost upon startup and could easily > > be cached. Maildir shouldn't be significantly slower than mbox except > > for this. > > Looking at the code 'scan_mailbox' seems to be called quite often (but not > more than every 30 seconds). > > I'm wondering if peeking a file in a very large directory is as fast as > seeking to a particular offset in a large file? If you're really interested in speed comparisons between mbox and maildir, http://www.courier-mta.org/mbox-vs-maildir/ is a good read. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <1201071069-sup-6394@south>]
[parent not found: <1201079068-sup-438@ausone.inria.fr>]
* [sup-talk] header cache [not found] ` <1201079068-sup-438@ausone.inria.fr> @ 2008-01-25 3:28 ` William Morgan 2008-01-25 8:50 ` Nicolas Pouillard 0 siblings, 1 reply; 15+ messages in thread From: William Morgan @ 2008-01-25 3:28 UTC (permalink / raw) [Sending to list. We accidentally fell off.] Reformatted excerpts from nicolas.pouillard's message of 2008-01-23: > Excerpts from William Morgan's message of Wed Jan 23 08:04:27 +0100 2008: > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-22: > > > Looking at the code 'scan_mailbox' seems to be called quite > > > often (but not more than every 30 seconds). > > > > Oops, you're exactly right. > > > > > I'm wondering if peeking a file in a very large directory > > > is as fast as seeking to a particular offset in a large file? > > > > Actually, we could short-circuit this check trivially by comparing > > the directories' mtimes. Then this really shouldn't be a slowdown. > > You will still scan it at when receiving new mails... That's true. But some kind of scan is unavoidable. We need to know which files are new, and one way or another that will involve pulling in the whole list of files. I think we can do a couple easy things to speed up Maildir dramatically. Skip the scan when the mtime is unchanged, maybe increase the delay, stop repeating make_id calls for messages we've already processed... basically the current implementation is a little naive and I'm not ready to give up on Maildir quite yet. > What about having a custom mail box format for sup? > > For instance I think the only drawback (in the context of using sup) > of mbox is when you often backup you mails. Why not having a bunch of > mbox using dates inbox-12-2008.mbox? Or a directory where file names > are sup message ids? What about mbox makes it difficult to do backups? The amount of work for a custom format seems really high. (Look at all the design considerations that went into Maildir!) And ultimately your MTA has to be able to write that format, or there has to be a conversion script, which has to be run every 30 seconds, and has to be fast, and then we have the same set of problems we have now but in a different place. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [sup-talk] header cache 2008-01-25 3:28 ` William Morgan @ 2008-01-25 8:50 ` Nicolas Pouillard 0 siblings, 0 replies; 15+ messages in thread From: Nicolas Pouillard @ 2008-01-25 8:50 UTC (permalink / raw) Excerpts from William Morgan's message of Fri Jan 25 04:28:07 +0100 2008: > [Sending to list. We accidentally fell off.] > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-23: > > Excerpts from William Morgan's message of Wed Jan 23 08:04:27 +0100 2008: > > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-22: > > > > Looking at the code 'scan_mailbox' seems to be called quite > > > > often (but not more than every 30 seconds). > > > > > > Oops, you're exactly right. > > > > > > > I'm wondering if peeking a file in a very large directory > > > > is as fast as seeking to a particular offset in a large file? > > > > > > Actually, we could short-circuit this check trivially by comparing > > > the directories' mtimes. Then this really shouldn't be a slowdown. > > > > You will still scan it at when receiving new mails... > > That's true. But some kind of scan is unavoidable. We need to know which > files are new, and one way or another that will involve pulling in the > whole list of files. > > I think we can do a couple easy things to speed up Maildir dramatically. > Skip the scan when the mtime is unchanged, maybe increase the delay, > stop repeating make_id calls for messages we've already processed... > basically the current implementation is a little naive and I'm not ready > to give up on Maildir quite yet. What about having two directories? new and cur for instance :) You always have to scan new and move them to cur when they are in the index. > > What about having a custom mail box format for sup? > > > > For instance I think the only drawback (in the context of using sup) > > of mbox is when you often backup you mails. Why not having a bunch of > > mbox using dates inbox-12-2008.mbox? Or a directory where file names > > are sup message ids? > > What about mbox makes it difficult to do backups? Every hour, if my backup drive is plugged in, my OS make a new tree of my entire disk by hard linking what's haven't change and by coping others. However this behave particularly bad with files that just keep growing every hour ~400MB at this day. That's why splitting out my big mail box in small ones, let's say every month, would better behave w.r.t. to backups. > The amount of work for a custom format seems really high. (Look at all > the design considerations that went into Maildir!) And ultimately your > MTA has to be able to write that format, or there has to be a conversion > script, which has to be run every 30 seconds, and has to be fast, and > then we have the same set of problems we have now but in a different > place. Your MTA call a MDA like maildrop, and making a specific maildrop in ruby is pretty easy. Best regards, -- Nicolas Pouillard aka Ertai -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 186 bytes Desc: not available Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080125/cd24b771/attachment.bin ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-01-25 8:50 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-15 12:54 [sup-talk] header cache Giorgio Lando 2008-01-16 3:24 ` William Morgan 2008-01-16 20:26 ` Giorgio Lando 2008-01-16 20:48 ` William Morgan 2008-01-16 22:04 ` Giorgio Lando 2008-01-16 22:58 ` Alec Berryman 2008-01-16 23:36 ` Giorgio Lando 2008-01-17 8:13 ` Nicolas Pouillard 2008-01-17 11:40 ` Giorgio Lando 2008-01-17 14:56 ` Nicolas Pouillard 2008-01-22 2:36 ` William Morgan 2008-01-22 8:40 ` Nicolas Pouillard 2008-01-23 20:33 ` Gabriel Sean Farrell [not found] ` <1201071069-sup-6394@south> [not found] ` <1201079068-sup-438@ausone.inria.fr> 2008-01-25 3:28 ` William Morgan 2008-01-25 8:50 ` Nicolas Pouillard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox