From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 10.103.168.9 with SMTP id v9cs32978muo; Thu, 25 Mar 2010 09:26:19 -0700 (PDT) Received: by 10.229.222.76 with SMTP id if12mr2297520qcb.17.1269534377981; Thu, 25 Mar 2010 09:26:17 -0700 (PDT) Return-Path: Received: from rubyforge.org (rubyforge.org [205.234.109.19]) by mx.google.com with ESMTP id 15si5925437qyk.44.2010.03.25.09.26.17; Thu, 25 Mar 2010 09:26:17 -0700 (PDT) Received-SPF: pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) client-ip=205.234.109.19; Authentication-Results: mx.google.com; spf=pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) smtp.mail=sup-devel-bounces@rubyforge.org Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 03BB61858304; Thu, 25 Mar 2010 12:26:17 -0400 (EDT) Received: from pion.club.cc.cmu.edu (PION.CLUB.CC.cmu.edu [128.237.157.88]) by rubyforge.org (Postfix) with ESMTP id 1EEA61858112 for ; Thu, 25 Mar 2010 12:11:57 -0400 (EDT) Received: from rlane by pion.club.cc.cmu.edu with local (Exim 4.69) (envelope-from ) id 1Nupf3-0007RA-8H; Thu, 25 Mar 2010 12:11:57 -0400 From: Rich Lane To: Mark Alexander In-reply-to: <1269516077-sup-4573@r61> References: <1269499582-sup-2593@zyrg.net> <1269516077-sup-4573@r61> Date: Thu, 25 Mar 2010 12:11:57 -0400 Message-Id: <1269532152-sup-1158@zyrg.net> User-Agent: Sup/git Cc: sup-devel Subject: Re: [sup-devel] new branch: maildir X-BeenThere: sup-devel@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: Sup developer discussion List-Id: Sup developer discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: sup-devel-bounces@rubyforge.org Errors-To: sup-devel-bounces@rubyforge.org Excerpts from Mark Alexander's message of 2010-03-25 07:24:59 -0400: > Excerpts from Rich Lane's message of Thu Mar 25 03:12:57 -0400 2010: > > This branch makes some drastic changes to how mbox and maildir sources > > work. > > Thanks for attacking this problem! > > I just took a quick look at the diffs, and I have some concern > about this line in maildir.rb: > > Dir[File.join(subdir, '*')].map do |fn| > > I'm worried about the memory usage with some of my maildirs that have > tens of thousands of files. Would it be more memory-efficient to > use Dir.open and Dir.each? You'd have to filter out "." and "..", > of course. Hence the "XXX use less memory" :). I've been doing my testing on a 30k maildir which works fine. My sup scalability target is a million messages and memory becomes a concern there. A maildir filename is about 30 characters plus any Ruby overhead. The primitives we have are: Iterate through filenames in a directory in arbitrary (?) order. Check the existence of a single file in a directory. Iterate through filenames with a given prefix stored in the index in lexicographical order. Any more? Right now I took the easiest route which loads both the filesystem and indexed filenames into arrays and diffs them. Iterating over the index and checking the file's existence won't detect new messages. Iterating over the filesystem and checking for existence in the index won't detect deleted messages. A solution would be to do both, but that seems expensive. It would be good if we could optimize for the case where most of the maildir messages have already been indexed. _______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel