* [sup-devel] new branch: maildir
@ 2010-03-25 7:12 Rich Lane
2010-03-25 11:24 ` Mark Alexander
0 siblings, 1 reply; 4+ messages in thread
From: Rich Lane @ 2010-03-25 7:12 UTC (permalink / raw)
To: sup-devel
This branch makes some drastic changes to how mbox and maildir sources
work. There's no longer any state associated with a source between Sup
runs - no cur_offset or mtimes in sources.yaml. Instead, the source
queries the index to find out which messages it's already seen and which
messages are new. This enables a much more robust maildir
implementation that detects the addition or deletion of any message.
It's not totally done yet. It'll detect that a maildir message has been
deleted, but it doesn't yet remove the old location from the index so
renames are unlikely to work. There needs to be UI code to handle the
case where a message matches a search but has been deleted from all
sources, and probably a utility to remove such messages from the index.
I expect sup-sync-back to be broken.
Keeping track of multiple locations per message requires an index format
change. The upgrade process is trivial and done automatically but you
won't be able to use that index with an older Sup. For now if you want
to try this out I suggest using a different SUP_BASE.
I'd appreciate any comments about the code or general approach. If
anyone would like to contribute an email corpus for the unwritten
testsuite or pseudocode out some testcases that would be very helpful
too.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [sup-devel] new branch: maildir
2010-03-25 7:12 [sup-devel] new branch: maildir Rich Lane
@ 2010-03-25 11:24 ` Mark Alexander
2010-03-25 13:30 ` Ben Walton
2010-03-25 16:11 ` Rich Lane
0 siblings, 2 replies; 4+ messages in thread
From: Mark Alexander @ 2010-03-25 11:24 UTC (permalink / raw)
To: Rich Lane; +Cc: sup-devel
Excerpts from Rich Lane's message of Thu Mar 25 03:12:57 -0400 2010:
> This branch makes some drastic changes to how mbox and maildir sources
> work.
Thanks for attacking this problem!
I just took a quick look at the diffs, and I have some concern
about this line in maildir.rb:
Dir[File.join(subdir, '*')].map do |fn|
I'm worried about the memory usage with some of my maildirs that have
tens of thousands of files. Would it be more memory-efficient to
use Dir.open and Dir.each? You'd have to filter out "." and "..",
of course.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [sup-devel] new branch: maildir
2010-03-25 11:24 ` Mark Alexander
@ 2010-03-25 13:30 ` Ben Walton
2010-03-25 16:11 ` Rich Lane
1 sibling, 0 replies; 4+ messages in thread
From: Ben Walton @ 2010-03-25 13:30 UTC (permalink / raw)
To: sup-devel
Excerpts from Mark Alexander's message of Thu Mar 25 07:24:59 -0400 2010:
> I'm worried about the memory usage with some of my maildirs that
> have tens of thousands of files. Would it be more memory-efficient
> to use Dir.open and Dir.each? You'd have to filter out "." and
> "..", of course.
Agreed:
bwalton @ pinkfloyd : ~/Maildir/new
$ ls -1 | wc -l
35058
I do like the overall idea though. I'll try to give this a spin
tonight.
Thanks
-Ben
--
Ben Walton
Systems Programmer - CHASS
University of Toronto
C:416.407.5610 | W:416.978.4302
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [sup-devel] new branch: maildir
2010-03-25 11:24 ` Mark Alexander
2010-03-25 13:30 ` Ben Walton
@ 2010-03-25 16:11 ` Rich Lane
1 sibling, 0 replies; 4+ messages in thread
From: Rich Lane @ 2010-03-25 16:11 UTC (permalink / raw)
To: Mark Alexander; +Cc: sup-devel
Excerpts from Mark Alexander's message of 2010-03-25 07:24:59 -0400:
> Excerpts from Rich Lane's message of Thu Mar 25 03:12:57 -0400 2010:
> > This branch makes some drastic changes to how mbox and maildir sources
> > work.
>
> Thanks for attacking this problem!
>
> I just took a quick look at the diffs, and I have some concern
> about this line in maildir.rb:
>
> Dir[File.join(subdir, '*')].map do |fn|
>
> I'm worried about the memory usage with some of my maildirs that have
> tens of thousands of files. Would it be more memory-efficient to
> use Dir.open and Dir.each? You'd have to filter out "." and "..",
> of course.
Hence the "XXX use less memory" :). I've been doing my testing on a 30k
maildir which works fine. My sup scalability target is a million
messages and memory becomes a concern there. A maildir filename is about
30 characters plus any Ruby overhead.
The primitives we have are:
Iterate through filenames in a directory in arbitrary (?) order.
Check the existence of a single file in a directory.
Iterate through filenames with a given prefix stored in the index in lexicographical order.
Any more?
Right now I took the easiest route which loads both the filesystem and
indexed filenames into arrays and diffs them. Iterating over the index
and checking the file's existence won't detect new messages. Iterating
over the filesystem and checking for existence in the index won't detect
deleted messages. A solution would be to do both, but that seems
expensive. It would be good if we could optimize for the case where most
of the maildir messages have already been indexed.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-03-25 16:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-25 7:12 [sup-devel] new branch: maildir Rich Lane
2010-03-25 11:24 ` Mark Alexander
2010-03-25 13:30 ` Ben Walton
2010-03-25 16:11 ` Rich Lane
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox