Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
* [sup-talk] header cache
@ 2008-01-15 12:54 Giorgio Lando
  2008-01-16  3:24 ` William Morgan
  0 siblings, 1 reply; 15+ messages in thread
From: Giorgio Lando @ 2008-01-15 12:54 UTC (permalink / raw)


Hi,
after ten days of successful use of sup. I would like to say that it
is very useful in many circumstances (I have tried maildir, mbox, IMAP
folders, and a mix of them).

When using it with gmail imap, the only thing I lack is the header
cache. In fact, I access directly the "All mail" folder (since
anything else in gmail is actually a duplicate of the mails in All
mail; moreover in this way any change by sup does not interfere with
the archiviation from the gmail web interface, and the archiviation
from the web interface does not prevent sup from seeing the message).

My "All mail" includes approximately 40000 messages so the fetching of
IMAP headers is highly time-consuming (also the initial sup-sync took
about 8 hours, but this happens only one time).

I see that in TODO the header cache is scheduled for the 0.4 release.
So am I correct to expect it quite soon?

Thanks in advance for any info
Giorgio Lando


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-15 12:54 [sup-talk] header cache Giorgio Lando
@ 2008-01-16  3:24 ` William Morgan
  2008-01-16 20:26   ` Giorgio Lando
  0 siblings, 1 reply; 15+ messages in thread
From: William Morgan @ 2008-01-16  3:24 UTC (permalink / raw)


Reformatted excerpts from Giorgio Lando's message of 2008-01-15:
> My "All mail" includes approximately 40000 messages so the fetching of
> IMAP headers is highly time-consuming (also the initial sup-sync took
> about 8 hours, but this happens only one time).
> 
> I see that in TODO the header cache is scheduled for the 0.4 release.
> So am I correct to expect it quite soon?

Yes, I have a stub implementation in the works that should be ready for
testing soon. I'll focus more attention on it now that I have a
customer. :)

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-16  3:24 ` William Morgan
@ 2008-01-16 20:26   ` Giorgio Lando
  2008-01-16 20:48     ` William Morgan
  0 siblings, 1 reply; 15+ messages in thread
From: Giorgio Lando @ 2008-01-16 20:26 UTC (permalink / raw)


> Yes, I have a stub implementation in the works that should be ready for
> testing soon. I'll focus more attention on it now that I have a
> customer. :)

Fine, ready for eventual testing, although I have finally decided to
sync locally the 40000 messages with another tool, so to have them
indexed and opened by sup in a faster way. Anyway I guess that a cache
helps only relatively with such a monster (or at least also mutt's
header cache helps only a bit with 40000 messages in an IMAP folder), 
so I will be able to make tests with other, much smaller imap folders.
Giorgio


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-16 20:26   ` Giorgio Lando
@ 2008-01-16 20:48     ` William Morgan
  2008-01-16 22:04       ` Giorgio Lando
  0 siblings, 1 reply; 15+ messages in thread
From: William Morgan @ 2008-01-16 20:48 UTC (permalink / raw)


Reformatted excerpts from Giorgio Lando's message of 2008-01-16:
> Anyway I guess that a cache helps only relatively with such a monster
> (or at least also mutt's header cache helps only a bit with 40000
> messages in an IMAP folder), so I will be able to make tests with
> other, much smaller imap folders.

If you look at the log messages, there's one at the emitted at the
beginning of loading the headers, and one at the end, so you can see
exactly how much time adding a cache will save you (minus however long
it takes to load things in from disk).

I'm curious what this number is for 40000 headers. It seems to be about
10 seconds for my 5000-message IMAP folder, so I can imagine it being
quite significant.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-16 20:48     ` William Morgan
@ 2008-01-16 22:04       ` Giorgio Lando
  2008-01-16 22:58         ` Alec Berryman
  2008-01-17  8:13         ` Nicolas Pouillard
  0 siblings, 2 replies; 15+ messages in thread
From: Giorgio Lando @ 2008-01-16 22:04 UTC (permalink / raw)


> If you look at the log messages, there's one at the emitted at the
> beginning of loading the headers, and one at the end, so you can see
> exactly how much time adding a cache will save you (minus however long
> it takes to load things in from disk).
> 
> I'm curious what this number is for 40000 headers. It seems to be about
> 10 seconds for my 5000-message IMAP folder, so I can imagine it being
> quite significant.

I guess you are referring to mutt: in this case I can confirm that the
cache (mutt is compiled with gdbm) saves about 80 secs.

What I meant is that it is nonetheless very slow, and that this is why
one SHOULD NOT ordinarily access an IMAP folder with 40000 headers. In fact I
do not do this in mutt, where on the contrary I enter INBOX and change folder only when
needed (so only seldom I change to All Mail). I can not do exactly the same 
with sup: either I pull mails only from the INBOX, but then everything already 
archived in gmail through the web interface will
not be seen by sup, or I go with All Mail, but then I have to load each
time 40000 headers, which is extra slow (and would be very slow also with an header cache).
Finally, there would not be any point to pull mails from all the other folders/labels, 
because they would include only duplicates of mails in All Mail and they
would not enrich sup index at all.

I have also the feeling that sup's philosophy is not very consistent
with IMAP's philosophy, which is to access the same structure of folders
from different interfaces, syncing immediately any operation on the server, 
so that folders and labels are exactly the same from any interface.
Sup follows its own philosophy: thus, it does not deal with folders at all, 
and (as you admit often) does not aim to play well with other ways to check emails.
Gmail's imap is not an exception to this conflict between sup's
philosophy and imap's philosophy, because gmail's labels are, when you
access gmail through IMAP, nothing else than folders!

So I have finally decided to go back to POP3, download all mails (including
mails sent from the web interface, in the case of gmail) locally and let sup
index and open them locally. The mails are anyway kept on the server, so
I can also access them through a traditional IMAP client with folders
(such as mutt) or through the web interface when I am around the world
without my laptop.
In this way, I can use sup in a fast way without any bad interplay with
other ways of checking emails. This is very important for me because I
travel a lot and it
is not possible for me to access emails exclusively from my own
computers with sup installed :) .

Thus I do not think that I will go back to access gmail through IMAP
with sup. Anyway I hope to be able to do some testing of the IMAP headers cache 
with other small IMAP servers I have access to.

Giorgio



 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-16 22:04       ` Giorgio Lando
@ 2008-01-16 22:58         ` Alec Berryman
  2008-01-16 23:36           ` Giorgio Lando
  2008-01-17  8:13         ` Nicolas Pouillard
  1 sibling, 1 reply; 15+ messages in thread
From: Alec Berryman @ 2008-01-16 22:58 UTC (permalink / raw)


Giorgio Lando on 2008-01-16 23:04:26 +0100:

> So I have finally decided to go back to POP3, download all mails (including
> mails sent from the web interface, in the case of gmail) locally and let sup
> index and open them locally. The mails are anyway kept on the server, so
> I can also access them through a traditional IMAP client with folders
> (such as mutt) or through the web interface when I am around the world
> without my laptop.

Have you looked at offlineimap?  I started using it with mutt because
IMAP, even with header cachine, was just way too slow.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-16 22:58         ` Alec Berryman
@ 2008-01-16 23:36           ` Giorgio Lando
  0 siblings, 0 replies; 15+ messages in thread
From: Giorgio Lando @ 2008-01-16 23:36 UTC (permalink / raw)


Excerpts from Alec Berryman's message of Wed Jan 16 23:58:17 +0100 2008:
> Have you looked at offlineimap?  I started using it with mutt because
> IMAP, even with header cachine, was just way too slow.

Yes, I have used it and isync: you are right,they solve the problem of the slowness, 
since all the syncing happens in the background when using mutt or sup. 

But still I think that, when one does not care about folders and about
synchronization of different interfaces, the intrinsic advantages of
IMAP over POP3 tend to disappear. On the contrary there is the risk of
strange interplay (such as sup changing the flags in way that
offlineimap syncs to an IMAP server which does not understand or does
strange things); a solution is to use isync as a push-only synchronizer
(only sync the changes from the server to the local machine and not the
opposite), but really in this way you loose all the advantages of IMAP
and I prefer to use well-tested and powerful fetchers for POP3, such
as fdm and getmail.

Obviously if on the contrary for some reason you are forced to use IMAP,
then offlineimap and isync are the best solution

Giorgio


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-16 22:04       ` Giorgio Lando
  2008-01-16 22:58         ` Alec Berryman
@ 2008-01-17  8:13         ` Nicolas Pouillard
  2008-01-17 11:40           ` Giorgio Lando
  1 sibling, 1 reply; 15+ messages in thread
From: Nicolas Pouillard @ 2008-01-17  8:13 UTC (permalink / raw)


Excerpts from Giorgio Lando's message of Wed Jan 16 23:04:26 +0100 2008:
> 

[...]

> So I have finally decided to go back to POP3, download all mails (including
> mails sent from the web interface, in the case of gmail) locally and let sup
> index and open them locally. The mails are anyway kept on the server, so
> I can also access them through a traditional IMAP client with folders
> (such as mutt) or through the web interface when I am around the world
> without my laptop.

I use this setup too, using mbox and it's quite fast.

-- 
Nicolas Pouillard aka Ertai


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-17  8:13         ` Nicolas Pouillard
@ 2008-01-17 11:40           ` Giorgio Lando
  2008-01-17 14:56             ` Nicolas Pouillard
  0 siblings, 1 reply; 15+ messages in thread
From: Giorgio Lando @ 2008-01-17 11:40 UTC (permalink / raw)


Excerpts from nicolas.pouillard's message of Thu Jan 17 09:13:21 +0100 2008:
> Excerpts from Giorgio Lando's message of Wed Jan 16 23:04:26 +0100 2008:
> > 
> 
> [...]
> 
> > So I have finally decided to go back to POP3, download all mails (including
> > mails sent from the web interface, in the case of gmail) locally and let sup
> > index and open them locally. The mails are anyway kept on the server, so
> > I can also access them through a traditional IMAP client with folders
> > (such as mutt) or through the web interface when I am around the world
> > without my laptop.
> 
> I use this setup too, using mbox and it's quite fast.

Yes, it is fast also with maildir! Quite satisfied
Giorgio

-- 
Giorgio Lando <patroclo7 at gmail dot com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-17 11:40           ` Giorgio Lando
@ 2008-01-17 14:56             ` Nicolas Pouillard
  2008-01-22  2:36               ` William Morgan
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pouillard @ 2008-01-17 14:56 UTC (permalink / raw)


Excerpts from Giorgio Lando's message of Thu Jan 17 12:40:47 +0100 2008:
> Excerpts from nicolas.pouillard's message of Thu Jan 17 09:13:21 +0100 2008:
> > Excerpts from Giorgio Lando's message of Wed Jan 16 23:04:26 +0100 2008:
> > > 
> > 
> > [...]
> > 
> > > So I have finally decided to go back to POP3, download all mails (including
> > > mails sent from the web interface, in the case of gmail) locally and let sup
> > > index and open them locally. The mails are anyway kept on the server, so
> > > I can also access them through a traditional IMAP client with folders
> > > (such as mutt) or through the web interface when I am around the world
> > > without my laptop.
> > 
> > I use this setup too, using mbox and it's quite fast.
> 
> Yes, it is fast also with maildir! Quite satisfied

I've also tried maildir for some time, and found it slower.

That's because Sup needs to compute a hash from file names to message ids.

-- 
Nicolas Pouillard aka Ertai


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-17 14:56             ` Nicolas Pouillard
@ 2008-01-22  2:36               ` William Morgan
  2008-01-22  8:40                 ` Nicolas Pouillard
  0 siblings, 1 reply; 15+ messages in thread
From: William Morgan @ 2008-01-22  2:36 UTC (permalink / raw)


Reformatted excerpts from nicolas.pouillard's message of 2008-01-17:
> That's because Sup needs to compute a hash from file names to message
> ids.

Like IMAP headers, this is a one-time cost upon startup and could easily
be cached. Maildir shouldn't be significantly slower than mbox except
for this.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-22  2:36               ` William Morgan
@ 2008-01-22  8:40                 ` Nicolas Pouillard
  2008-01-23 20:33                   ` Gabriel Sean Farrell
       [not found]                   ` <1201071069-sup-6394@south>
  0 siblings, 2 replies; 15+ messages in thread
From: Nicolas Pouillard @ 2008-01-22  8:40 UTC (permalink / raw)


Excerpts from William Morgan's message of Tue Jan 22 03:36:48 +0100 2008:
> Reformatted excerpts from nicolas.pouillard's message of 2008-01-17:
> > That's because Sup needs to compute a hash from file names to message
> > ids.
> 
> Like IMAP headers, this is a one-time cost upon startup and could easily
> be cached. Maildir shouldn't be significantly slower than mbox except
> for this.

Looking  at  the  code  'scan_mailbox' seems to be called quite often (but not
more than every 30 seconds).

I'm  wondering  if  peeking  a  file  in  a very large directory is as fast as
seeking to a particular offset in a large file?

-- 
Nicolas Pouillard aka Ertai
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080122/41112479/attachment.bin 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-22  8:40                 ` Nicolas Pouillard
@ 2008-01-23 20:33                   ` Gabriel Sean Farrell
       [not found]                   ` <1201071069-sup-6394@south>
  1 sibling, 0 replies; 15+ messages in thread
From: Gabriel Sean Farrell @ 2008-01-23 20:33 UTC (permalink / raw)


On Tue, Jan 22, 2008 at 09:40:05AM +0100, Nicolas Pouillard wrote:
> Excerpts from William Morgan's message of Tue Jan 22 03:36:48 +0100 2008:
> > Reformatted excerpts from nicolas.pouillard's message of 2008-01-17:
> > > That's because Sup needs to compute a hash from file names to message
> > > ids.
> > 
> > Like IMAP headers, this is a one-time cost upon startup and could easily
> > be cached. Maildir shouldn't be significantly slower than mbox except
> > for this.
> 
> Looking  at  the  code  'scan_mailbox' seems to be called quite often (but not
> more than every 30 seconds).
> 
> I'm  wondering  if  peeking  a  file  in  a very large directory is as fast as
> seeking to a particular offset in a large file?

If you're really interested in speed comparisons between mbox and
maildir, http://www.courier-mta.org/mbox-vs-maildir/ is a good read.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
       [not found]                     ` <1201079068-sup-438@ausone.inria.fr>
@ 2008-01-25  3:28                       ` William Morgan
  2008-01-25  8:50                         ` Nicolas Pouillard
  0 siblings, 1 reply; 15+ messages in thread
From: William Morgan @ 2008-01-25  3:28 UTC (permalink / raw)


[Sending to list. We accidentally fell off.]

Reformatted excerpts from nicolas.pouillard's message of 2008-01-23:
> Excerpts from William Morgan's message of Wed Jan 23 08:04:27 +0100 2008:
> > Reformatted excerpts from nicolas.pouillard's message of 2008-01-22:
> > > Looking  at  the  code  'scan_mailbox' seems to be called quite
> > > often (but not more than every 30 seconds).
> > 
> > Oops, you're exactly right.
> > 
> > > I'm  wondering  if  peeking  a  file  in  a very large directory
> > > is as fast as seeking to a particular offset in a large file?
> > 
> > Actually, we could short-circuit this check trivially by comparing
> > the directories' mtimes. Then this really shouldn't be a slowdown.
> 
> You will still scan it at when receiving new mails...

That's true. But some kind of scan is unavoidable. We need to know which
files are new, and one way or another that will involve pulling in the
whole list of files.

I think we can do a couple easy things to speed up Maildir dramatically.
Skip the scan when the mtime is unchanged, maybe increase the delay,
stop repeating make_id calls for messages we've already processed...
basically the current implementation is a little naive and I'm not ready
to give up on Maildir quite yet.

> What about having a custom mail box format for sup?
> 
> For  instance  I think the only drawback (in the context of using sup)
> of mbox is when you often backup you mails. Why not having a bunch of
> mbox using dates inbox-12-2008.mbox? Or a directory where file names
> are sup message ids?

What about mbox makes it difficult to do backups?

The amount of work for a custom format seems really high. (Look at all
the design considerations that went into Maildir!) And ultimately your
MTA has to be able to write that format, or there has to be a conversion
script, which has to be run every 30 seconds, and has to be fast, and
then we have the same set of problems we have now but in a different
place.

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [sup-talk] header cache
  2008-01-25  3:28                       ` William Morgan
@ 2008-01-25  8:50                         ` Nicolas Pouillard
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pouillard @ 2008-01-25  8:50 UTC (permalink / raw)


Excerpts from William Morgan's message of Fri Jan 25 04:28:07 +0100 2008:
> [Sending to list. We accidentally fell off.]
> 
> Reformatted excerpts from nicolas.pouillard's message of 2008-01-23:
> > Excerpts from William Morgan's message of Wed Jan 23 08:04:27 +0100 2008:
> > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-22:
> > > > Looking  at  the  code  'scan_mailbox' seems to be called quite
> > > > often (but not more than every 30 seconds).
> > > 
> > > Oops, you're exactly right.
> > > 
> > > > I'm  wondering  if  peeking  a  file  in  a very large directory
> > > > is as fast as seeking to a particular offset in a large file?
> > > 
> > > Actually, we could short-circuit this check trivially by comparing
> > > the directories' mtimes. Then this really shouldn't be a slowdown.
> > 
> > You will still scan it at when receiving new mails...
> 
> That's true. But some kind of scan is unavoidable. We need to know which
> files are new, and one way or another that will involve pulling in the
> whole list of files.
> 
> I think we can do a couple easy things to speed up Maildir dramatically.
> Skip the scan when the mtime is unchanged, maybe increase the delay,
> stop repeating make_id calls for messages we've already processed...
> basically the current implementation is a little naive and I'm not ready
> to give up on Maildir quite yet.

What about having two directories? new and cur for instance :)
You always have to scan new and move them to cur when they are in the index.

> > What about having a custom mail box format for sup?
> > 
> > For  instance  I think the only drawback (in the context of using sup)
> > of mbox is when you often backup you mails. Why not having a bunch of
> > mbox using dates inbox-12-2008.mbox? Or a directory where file names
> > are sup message ids?
> 
> What about mbox makes it difficult to do backups?

Every  hour,  if  my  backup  drive is plugged in, my OS make a new tree of my
entire  disk  by  hard  linking  what's  haven't  change and by coping others.
However  this  behave particularly bad with files that just keep growing every
hour ~400MB at this day.

That's  why  splitting  out  my  big  mail  box in small ones, let's say every
month, would better behave w.r.t. to backups.

> The amount of work for a custom format seems really high. (Look at all
> the design considerations that went into Maildir!) And ultimately your
> MTA has to be able to write that format, or there has to be a conversion
> script, which has to be run every 30 seconds, and has to be fast, and
> then we have the same set of problems we have now but in a different
> place.

Your  MTA  call a MDA like maildrop, and making a specific maildrop in ruby is
pretty easy.

Best regards,

-- 
Nicolas Pouillard aka Ertai
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080125/cd24b771/attachment.bin 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-01-25  8:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-15 12:54 [sup-talk] header cache Giorgio Lando
2008-01-16  3:24 ` William Morgan
2008-01-16 20:26   ` Giorgio Lando
2008-01-16 20:48     ` William Morgan
2008-01-16 22:04       ` Giorgio Lando
2008-01-16 22:58         ` Alec Berryman
2008-01-16 23:36           ` Giorgio Lando
2008-01-17  8:13         ` Nicolas Pouillard
2008-01-17 11:40           ` Giorgio Lando
2008-01-17 14:56             ` Nicolas Pouillard
2008-01-22  2:36               ` William Morgan
2008-01-22  8:40                 ` Nicolas Pouillard
2008-01-23 20:33                   ` Gabriel Sean Farrell
     [not found]                   ` <1201071069-sup-6394@south>
     [not found]                     ` <1201079068-sup-438@ausone.inria.fr>
2008-01-25  3:28                       ` William Morgan
2008-01-25  8:50                         ` Nicolas Pouillard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox