From: wmorgan-sup@masanjin.net (William Morgan)
Subject: [sup-talk] Possible problem with maildir ID generation
Date: Wed, 29 Apr 2009 15:31:52 -0700 [thread overview]
Message-ID: <1241038730-sup-2939@entry> (raw)
In-Reply-To: <a412e2a70904281629n6171bbb6i4c6330e375f8c6ad@mail.gmail.com>
Reformatted excerpts from Mark Alexander's message of 2009-04-28:
> Maildir filenames are unique, but they would need to be ordered by
> time, since sup depends on that ordering (look in maildir.rb for where
> it uses sort). I'm not sure if mail delivery programs (I use
> procmail) guarantee that the filenames are ordered that way.
That's correct; the name is not sufficient as ids because Sup needs a
single pointer into the Maildir as a marker for what it has already
processed, so we have to use something ordinal. But it can't just be
any old ordinal, it has to be something that corresponds with the way
messages are written to the Maildir, in order to be able to divide newer
messages from older ones.
A timestamp is the obvious choice, but messages can have the same
timestamp, so then what do you do? The current approach is to sort by
another arbitrary field (in this cae, message size), which gives a
unique ordering, but doesn't match up
(All this rigamarole about ordinals and blah blah blah is necessary
because I don't want Sup to rescan the entire Maildir unless absolutely
necessary. One day I'll convert my mbox to a Maildir with 250k files in
it, and a rescan will kill me, especially at Ruby speed.)
> I will say that the patch I sent out for maildir.rb has made my life a
> lot happier, but it's still not ideal because of the race condition I
> mentioned.
>
> William was talking about using some other scheme to generate IDs. We
> should see what he has to say about this.
Well I haven't quite started on it yet, but my plan is to:
a) Sort files by timestamp, and then by something else (maybe name), and
use the position in that array instead of the timestamp. This doesn't
solve anything, but it will make the ids prettier, and removes the
hideous "%7d" thing.
b) When polling, if the current "offset" is N, return all messages that
have a timestamp >= the Nth message. So this will mean that we'll
rescan messages on occasion, but we shouldn't miss any.
Any obvious flaws?
--
William <wmorgan-sup at masanjin.net>
next prev parent reply other threads:[~2009-04-29 22:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-16 22:05 Mark Alexander
2009-04-21 14:00 ` William Morgan
2009-04-21 15:33 ` Mark Alexander
[not found] ` <20090428191822.GB10581@cabinet.hsd1.ma.comcast.net>
2009-04-28 23:29 ` Mark Alexander
2009-04-29 22:31 ` William Morgan [this message]
2009-04-29 22:39 ` William Morgan
[not found] ` <20090429233820.GA14143@cabinet.hsd1.ma.comcast.net>
2009-05-04 16:10 ` William Morgan
[not found] ` <20090504165224.GA15815@cabinet.hsd1.ma.comcast.net>
2009-05-04 17:24 ` Mark Alexander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1241038730-sup-2939@entry \
--to=wmorgan-sup@masanjin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox