From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 10.182.92.230 with SMTP id cp6csp572965obb; Wed, 2 Jan 2013 12:52:42 -0800 (PST) X-Received: by 10.101.3.17 with SMTP id f17mr13610874ani.87.1357159961612; Wed, 02 Jan 2013 12:52:41 -0800 (PST) Return-Path: Received: from rubyforge.org (50-56-192-79.static.cloud-ips.com. [50.56.192.79]) by mx.google.com with ESMTP id s47si46956588yhb.15.2013.01.02.12.52.41; Wed, 02 Jan 2013 12:52:41 -0800 (PST) Received-SPF: pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 50.56.192.79 as permitted sender) client-ip=50.56.192.79; Authentication-Results: mx.google.com; spf=pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 50.56.192.79 as permitted sender) smtp.mail=sup-devel-bounces@rubyforge.org Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 0D2722E0A8; Wed, 2 Jan 2013 20:52:42 +0000 (UTC) Received: from mail-ea0-f178.google.com (mail-ea0-f178.google.com [209.85.215.178]) by rubyforge.org (Postfix) with ESMTP id 310682E084 for ; Wed, 2 Jan 2013 20:52:31 +0000 (UTC) Received: by mail-ea0-f178.google.com with SMTP id k11so6099424eaa.37 for ; Wed, 02 Jan 2013 12:52:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=DVkwDO/MgR6jr2zt+tUidxxBGDwgKWPaBs5w/KijQYg=; b=etRXEu2PJ57jJhqp0fdF+pApxi6JIHpXV6uZTsSG1goV4mL+ns64eL3GYK5uzVGwTT VxpwquGVS3ZQGkMd3ZGxA+8gV+D/X2wtl9zlYYVLltPWBTlzwdQNpjiqcqtttiM3Bg3k 45wLVP+VhuDilnsYrACPCU7TxmsoFUkwkDKW4qwrSlSGo1jrJ6FqNL5vdwUQ8ZfgNIZG IFNV6VDeVa90DzWav+8bvhIVvu/AbHKNJs96cqeVQsyaAInlMHDAEpBAXxa3XTcmoapE 8Lm0dgMQY4ChbA8l9kaP4SwGo+MliK1pKMQ4TNB9Uu8D2hdSm443VZHJ62L+Oci1nDXO Zi4A== Received: by 10.14.225.4 with SMTP id y4mr126687800eep.6.1357159646468; Wed, 02 Jan 2013 12:47:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.157.75 with HTTP; Wed, 2 Jan 2013 12:47:06 -0800 (PST) In-Reply-To: References: From: Matthieu Rakotojaona Date: Wed, 2 Jan 2013 21:47:06 +0100 Message-ID: To: Sup developer discussion Subject: Re: [sup-devel] after second heliotrope import, new messages not appearing X-BeenThere: sup-devel@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: Sup developer discussion List-Id: Sup developer discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============1883221549079567928==" Sender: sup-devel-bounces@rubyforge.org Errors-To: sup-devel-bounces@rubyforge.org --===============1883221549079567928== Content-Type: multipart/alternative; boundary=047d7b622676a19a1504d2545a3d --047d7b622676a19a1504d2545a3d Content-Type: text/plain; charset=UTF-8 So, it appears heliotrope tells you your mails are indexed, yet they don't appear. All I see is `MetaIndex#add_message` returning in the test that checks if the mail is already in the store. Problem is, the exact same check is already done before that moment, and if the mail really was already present, it should be marked as "seen" and not "indexed". Another possibility could be `MetaIndex#gen_new_docid` which, for an unknown reason, returns a wrong value, so it would always overwrite an old message. I'm not really good at debugging efficiently, but I would re-launch the import after having placed a few "puts" in this `add_message` method, see if the existence test returns true or not. If not, print the docid, and see what the message with that docid is. If the message with that docid is the one you just imported, then there is a problem with how we generate them (because it was already given to another message). On Tue, Jan 1, 2013 at 7:33 PM, Hamish D wrote: > > Your problem may have multiple origins, and I have no idea where you > could > > start : > > > > - Did you use the correct directories for messages and index for _both_ > the > > import and the server ? (sorry if this sounds stupid) > > Always good to check the basics, but yes, I do use the same directory. > I have scripts for the import and the server and both use the -d > argument to ensure they use the same directory. > > > - Did the index size in heliotrope change ? (as given by a GET to > > /size.json) > > No - it remained the same. (And is about half of the count that sup > reports). > > > - Did the messages file size change ? > > It did on one import, but not the most recent. > > > - Are the messages absent in inbox only or from the whole turnsole ? > > All searches that I do stop in May. > > > - As a last resort, could you send the mbox file so that I can try this > at > > home, if your mails aren't too personal ? > > It's my work email, with a total of over 100,000 messages. So I'd > prefer not to send it over. > > Any logs that might be worth looking at? While doing the import, I do > get various lines along the line of: > > ; scanned 71, indexed 71, skipped 0 bad and 0 seen messages in > 22.1s = 3.2 m/s > > I do use the state argument (-t) to save reimporting old mail. I have > also tried doing a reindex (as I import from multiple mboxes) but to > no avail. > > I could just try another import of the whole lot into a clean directory I > guess. > > I'm also happy to type stuff into heliotrope-console if that will help > diagnose stuff. > > Thanks for your suggestions so far > Hamish > > > > On Thu, Dec 27, 2012 at 5:19 PM, Hamish D wrote: > >> > >> Hello > >> > >> I've just been trying to carry on from where I left off with migrating > >> my work email into heliotrope (which I last worked on in May). > >> > >> So I've rsynced the mbox files across to where I will run heliotrope > >> and tried running heliotrope-import to put all the new messages in the > >> index. It appeared to work properly, reporting that messages were > >> being added to the index and the index files themselves have had their > >> timestamps updated. But when I start turnsole, I can't find any email > >> after May. > >> > >> Any ideas on where to start looking to work out what is going on? > >> > >> Hamish > _______________________________________________ > Sup-devel mailing list > Sup-devel@rubyforge.org > http://rubyforge.org/mailman/listinfo/sup-devel > -- Matthieu RAKOTOJAONA --047d7b622676a19a1504d2545a3d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
So, it appears heliotrope tells you your mails are indexed= , yet they don't appear. All I see is `MetaIndex#add_message` returning= in the test that checks if the mail is already in the store. Problem is, t= he exact same check is already done before that moment, and if the mail rea= lly was already present, it should be marked as "seen" and not &q= uot;indexed".

Another possibility could be `MetaIndex#gen_new_docid` which= , for an unknown reason, returns a wrong value, so it would always overwrit= e an old message.

I'm not really good at debug= ging efficiently, but I would re-launch the import after having placed a fe= w "puts" in this `add_message` method, see if the existence test = returns true or not. If not, print the docid, and see what the message with= that docid is. If the message with that docid is the one you just imported= , then there is a problem with how we generate them (because it was already= given to another message).


On Tue,= Jan 1, 2013 at 7:33 PM, Hamish D <dmishd@gmail.com> wrote:
> Your problem may have multiple origins, and I have n= o idea where you could
> start :
>
> - Did you use the correct directories for messages and index for _both= _ the
> import and the server ? (sorry if this sounds stupid)

Always good to check the basics, but yes, I do use the same directory= .
I have scripts for the import and the server and both use the -d
argument to ensure they use the same directory.

> - Did the index size in heliotrope change ? (as given by a GET to
> /size.json)

No - it remained the same. (And is about half of the count that sup r= eports).

> - Did the messages file size change ?

It did on one import, but not the most recent.

> - Are the messages absent in inbox only or from the whole turnsole ?
All searches that I do stop in May.

> - As a last resort, could you send the mbox file so that I can try thi= s at
> home, if your mails aren't too personal ?

It's my work email, with a total of over 100,000 messages. So I&#= 39;d
prefer not to send it over.

Any logs that might be worth looking at? While doing the import, I do
get various lines along the line of:

=C2=A0 =C2=A0 ; scanned 71, indexed 71, skipped 0 bad and 0 seen messages i= n
22.1s =3D 3.2 m/s

I do use the state argument (-t) to save reimporting old mail. I have
also tried doing a reindex (as I import from multiple mboxes) but to
no avail.

I could just try another import of the whole lot into a clean directory I g= uess.

I'm also happy to type stuff into heliotrope-console if that will help<= br> diagnose stuff.

Thanks for your suggestions so far
Hamish


> On Thu, Dec 27, 2012 at 5:19 PM, Hamish D <dmishd@gmail.com> wrote:
>>
>> Hello
>>
>> I've just been trying to carry on from where I left off with m= igrating
>> my work email into heliotrope (which I last worked on in May).
>>
>> So I've rsynced the mbox files across to where I will run heli= otrope
>> and tried running heliotrope-import to put all the new messages in= the
>> index. It appeared to work properly, reporting that messages were<= br> >> being added to the index and the index files themselves have had t= heir
>> timestamps updated. But when I start turnsole, I can't find an= y email
>> after May.
>>
>> Any ideas on where to start looking to work out what is going on?<= br> >>
>> Hamish
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel



--
= Matthieu RAKOTOJAONA
--047d7b622676a19a1504d2545a3d-- --===============1883221549079567928== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel --===============1883221549079567928==--