From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 10.224.21.196 with SMTP id k4csp28988qab; Sat, 1 Jun 2013 19:51:23 -0700 (PDT) X-Received: by 10.236.115.164 with SMTP id e24mr6190417yhh.182.1370141483025; Sat, 01 Jun 2013 19:51:23 -0700 (PDT) Return-Path: Received: from rubyforge.org ([50.56.192.79]) by mx.google.com with ESMTP id d28si47096233yhn.1.2013.06.01.19.51.22 for ; Sat, 01 Jun 2013 19:51:23 -0700 (PDT) Received-SPF: pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 50.56.192.79 as permitted sender) client-ip=50.56.192.79; Authentication-Results: mx.google.com; spf=pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 50.56.192.79 as permitted sender) smtp.mail=sup-devel-bounces@rubyforge.org; dkim=neutral (bad format) header.i=@gmail.com Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 8F7D42E158; Sun, 2 Jun 2013 02:51:22 +0000 (UTC) Received: from mail-ob0-f170.google.com (mail-ob0-f170.google.com [209.85.214.170]) by rubyforge.org (Postfix) with ESMTP id 1BF902E151 for ; Sun, 2 Jun 2013 02:45:48 +0000 (UTC) Received: by mail-ob0-f170.google.com with SMTP id ef5so5387207obb.15 for ; Sat, 01 Jun 2013 19:45:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TmmvPJaVD2rZk+xmBzQ3inxw1Q4Rvv/dvZEd/iB1Vfc=; b=DqM70R8BuoGmT58a+5fKHYgm9DR98eKhsbLtrwXaoZ9u1NUKcBLs/b6y89twz4y7Ju u0yRgxrsWb8jCUZKZ456ST+apLH89yE64GcMnbhy1exqhxAvcGF7HaK0ewMq69kFRV/N SOEsTGWvhO6liErSaXg23D4DpQeKjwWRRCXC/Dy5SzaASMGVVvU/Xax5zlHzE1nB8nZq +bCn63QulnJg/l+VI5xFr7u1IUpu2onQXky16jk8825KcoNzhCk2EE3R69Z1MpPJN7u5 voze+islzheU3WBdT6cRxi8ZYsSEhi+6aRiwMawIsrYh8gFvDLhWGmj1tUKb4HmTFvaC 0NRw== MIME-Version: 1.0 X-Received: by 10.60.136.234 with SMTP id qd10mr8334434oeb.15.1370141147577; Sat, 01 Jun 2013 19:45:47 -0700 (PDT) Received: by 10.182.80.228 with HTTP; Sat, 1 Jun 2013 19:45:47 -0700 (PDT) In-Reply-To: <1369172802-sup-2003@kpad> References: <518E1A2B.2080903@gaute.vetsj.com> <1369172802-sup-2003@kpad> Date: Sun, 2 Jun 2013 11:45:47 +0900 Message-ID: From: Horacio Sanson To: Sup developer discussion Subject: Re: [sup-devel] Experimental Gmail Source X-BeenThere: sup-devel@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: Sup developer discussion List-Id: Sup developer discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============7887868365697878334==" Sender: sup-devel-bounces@rubyforge.org Errors-To: sup-devel-bounces@rubyforge.org --===============7887868365697878334== Content-Type: multipart/alternative; boundary=047d7b33c98264c20904de22d87e --047d7b33c98264c20904de22d87e Content-Type: text/plain; charset=ISO-8859-1 Thanks for checking the source and sorry for the late response... I can only look into this on rare free weekends. On Wed, May 22, 2013 at 6:47 AM, Matthieu Rakotojaona < matthieu.rakotojaona@gmail.com> wrote: > Hey Horacio, > > I took a stab at your gmail_source branch, and made a few > fixes/improvements [0]: > > - Add configuration option in sup-add > - Dump the LevelDB path in the sources.yaml > - Add a load_from_yaml method for a source to initialize its working > values (for instance, the @db cannot be serialized, it needs to be > reconstructed) > - Fixed the msg_att monkey-patch for imap.rb > > Great, I will add these changes to my branch.... > All in all, the gmail source seems to work. I tested it on my usual > gmail account, I haven't tried to download it all, but I did download a > few dozens of emails without a problem. I'd like to warn users about > LevelDB though: it's sad to say, but as other wmorgan's stuff, it looks > abandoned. There are at least 2 bugs you will encounter if you try it: a > pb in configuration (fixed in [1]) and you need the `snappy` gem to make > it work if your db is more than 4MB large [2]. There are some up-to-date > forks, though. > > I see LevelDB is used mostly for storing messages and mailboxes > uid{validity/last}, but if we are to use gmail (it's the only IMAP > provider that makes sense for sup), I believe we would stick to the All > Mail label, right ? So, no need for storing this in db, rather in the > sources.yaml file. Also, if leveldb-ruby is unreliable (I did encounter > some issues way back about something with glibc...), and we want to use > it for caching messages, I think we can salvage heliotrope's zmbox [3] > because it's so simple to use yet far better than simple mbox. Using zmbox, mbox, maildir or any other mail storage (mix?) means I need to keep track of three indexes to allow two way sync between the Gmail source and the Sup index. I would need the Sup index id, the store id (e.g. zmbox file index) and the Gmail X-GM-MSGID. That complicates things a lot. Using key/value stores like LevelDB allows me to directly store the messages and associate them directly with the Gmail X-GM-MSGID. Also LevelDB comes with high compression for text data, perfect for emails, and high performance [1]. The issues you mention seem to be on the ruby library rather than LevelDB itself and they are fixable. If there are no bigger issues (e.g. data corruption/loss) I will stick with LevelDB. Regarding your ids questions, if you want to access the sup's messages > from the gmail source, you could use the mail's Message-ID header and > apply the same logic as in Message.sanitize_message_id. Caution, > however: I've already encountered the case where multiple messages in > GMail (i.e multiple X-GM-MSGID) have the same Message-ID, so they would > be considered the same in sup/heliotrope... yeah, that's annoying as > hell, and I don't know how we can solve this in the case of multiple > sources. > > Thanks, this comment put me on track and I found a way to get the emails from the index using the message id provided by the source. All I need to do is call Message.build_from_source(source, info) where info is the message id provided by the source. In my case this would be the X-GM-MSGID string. If you want to sync-back, maybe sup can call a source-level "sync_back" > method with the current known state ? Speaking of which, for general > synchronization we could reuse the elegant offlineimap's sync algorithm > [4]. The idea is basic: have each source class store a snapshot of the > state. When a message is modified on the source, diff the change with > the known status and propagate to sup; when a message is modified in > sup, diff with the known status and propagate to the source. > > Interesting and simple algorithm. Let me study it a little more and see how it is applicable to Sup. [1] http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html regards, Horacio Just a brain dump. > > [0] https://github.com/rakoo/sup/tree/gmail_source > [1] https://github.com/wmorgan/leveldb-ruby/pull/27 > [2] https://github.com/wmorgan/leveldb-ruby/issues/23 > [3] > https://github.com/sup-heliotrope/heliotrope/blob/64d4b50d5649ec616a311a4cf6955137fdaeb13d/lib/heliotrope/zmbox.rb > [4] http://offlineimap.org/howitworks.html > > Regards, > > -- > Matthieu Rakotojaona > > _______________________________________________ > Sup-devel mailing list > Sup-devel@rubyforge.org > http://rubyforge.org/mailman/listinfo/sup-devel > > --047d7b33c98264c20904de22d87e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks for checking the source and sorry for the late resp= onse... I can only look into this on rare free weekends.


On Wed, May 22, 2013 at 6:47= AM, Matthieu Rakotojaona <matthieu.rakotojaona@gmail.com= > wrote:
Hey Horacio,

I took a stab at your gmail_source branch, and made a few
fixes/improvements [0]:

- Add configuration option in sup-add
- Dump the LevelDB path in the sources.yaml
- Add a load_from_yaml method for a source to initialize its working
=A0 values (for instance, the @db cannot be serialized, it needs to be
=A0 reconstructed)
- Fixed the msg_att monkey-patch for imap.rb


Great, I will add these changes to my = branch....
=A0
All in all, the gmail source seems to work. I tested it on my usual
gmail account, I haven't tried to download it all, but I did download a=
few dozens of emails without a problem. I'd like to warn users about LevelDB though: it's sad to say, but as other wmorgan's stuff, it l= ooks
abandoned. There are at least 2 bugs you will encounter if you try it: a pb in configuration (fixed in [1]) and you need the `snappy` gem to make it work if your db is more than 4MB large [2]. There are some up-to-date forks, though.
=A0
=A0
I see LevelDB is used mostly for storing messages and mailboxes
uid{validity/last}, but if we are to use gmail (it's the only IMAP
provider that makes sense for sup), I believe we would stick to the All
Mail label, right ? So, no need for storing this in db, rather in the
sources.yaml file. Also, if leveldb-ruby is unreliable (I did encounter
some issues way back about something with glibc...), and we want to use
it for caching messages, I think we can salvage heliotrope's zmbox [3]<= br> because it's so simple to use yet far better than simple mbox.

Using zmbox, mbox, maildir or any other mail storag= e (mix?) means I need to keep track of three indexes to allow two way sync = between the Gmail source and the Sup index. I would need the Sup index id, = the store id (e.g. zmbox file index) and the Gmail X-GM-MSGID. That complic= ates things a lot.

Using key/value stores like LevelDB allows me to = directly store the messages and associate them directly with the Gmail X-GM= -MSGID. Also LevelDB comes with high compression for text data, perfect for= emails, and high performance [1]. The issues you mention seem to be on the= ruby library rather than LevelDB itself and they are fixable. If there are= no bigger issues (e.g. data corruption/loss) I will stick with LevelDB.

Regarding your ids questions, if you want to access the sup's messages<= br> from the gmail source, you could use the mail's Message-ID header and apply the same logic as in Message.sanitize_message_id. Caution,
however: I've already encountered the case where multiple messages in GMail (i.e multiple X-GM-MSGID) have the same Message-ID, so they would
be considered the same in sup/heliotrope... yeah, that's annoying as hell, and I don't know how we can solve this in the case of multiple sources.


Thanks, this comment put me on t= rack and I found a way to get the emails from the index using the message i= d provided by the source. All I need to do is call Message.build_from_sourc= e(source, info) where info is the message id provided by the source. In my = case this would be the X-GM-MSGID string.

If you want to sync-back, maybe sup can call a source-level "sync_back= "
method with the current known state ? Speaking of which, for general
synchronization we could reuse the elegant offlineimap's sync algorithm=
[4]. The idea is basic: have each source class store a snapshot of the
state. =A0When a message is modified on the source, diff the change with the known status and propagate to sup; when a message is modified in
sup, diff with the known status and propagate to the source.


--047d7b33c98264c20904de22d87e-- --===============7887868365697878334== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel --===============7887868365697878334==--