From mboxrd@z Thu Jan  1 00:00:00 1970
From: ehabkost@raisama.net (Eduardo Habkost)
Date: Thu, 06 Nov 2008 12:33:23 -0200
Subject: [sup-talk] EOFError crash
In-Reply-To: <1225907100-sup-4816@entry>
References: <1225392037-sup-9224@gillespie.rupamsunyata.org>
	<1225647558-sup-7206@gillespie.rupamsunyata.org>
	<1225907100-sup-4816@entry>
Message-ID: <1225981048-sup-9011@blackpad>

Excerpts from William Morgan's message of Wed Nov 05 15:48:34 -0200 2008:
> Reformatted excerpts from Decklin Foster's message of 2008-11-02:
> > This just happened again. Should I put it into ditz or something? (I
> > feel exceedingly lame, but I don't have time to debug it today
> > either.)
> 
> No. Sadly, this is one of the innumerable Ferret errors that crop up
> from time to time, which spurred STS.

I've been easily reproducing crashes similar to this one. The only thing
I need to reproduce it is making sure I load another label while sup is
still polling for new messages.

If I deliver a lot of new messages to a maildir source and don't run
sup-sync, sup will spend a few seconds loading the new messages and
there is plenty of time to hit L, go to a label (I don't know if it
needs to be the same label the new messages being loaded are getting),
and see the crash.

The "workaround" I am using here is being careful to never hit L when the
"polling for new messages" message is shown on the screen.

I have a small collection of core files, also (6 of them, by now),
all of them are from segfaults on the following line on ferret source:

#6  0x00421752 in is_seek (is=0xab75220, pos=31838147) at store.c:285
285             is->m->seek_i(is, pos);

Where is->m is corrupted (either 0 or a bogus value such as 0x10c0).

I don't have the ruby abort message for all of them, but I remember one
of them was triggered on lib/sup/index.rb, line 377 (at the
'fake_header = { ... }' stuff). Unfortunately ruby doesn't produce a
ruby stack trace on segfault, so I don't know what else was running at
the time of the crash (especially on the other threads).

The C backtrace looks like this:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00c76660 in raise () from /lib/libc.so.6
#2  0x00c78028 in abort () from /lib/libc.so.6
#3  0x004b6f08 in rb_bug (fmt=<value optimized out>) at error.c:214
#4  0x00525dfb in sigsegv (sig=<value optimized out>) at signal.c:629
#5  <signal handler called>
#6  0x00421752 in is_seek (is=0xa67f3a0, pos=24745648) at store.c:285
#7  0x003f42ea in cmpdi_read_i (is=0xafdbfa0, b=0xacda138 "\030\"?", len=170) at compound_io.c:140
#8  0x00421605 in is_read_bytes (is=0xafdbfa0, buf=0xacda138 "\030\"?", len=170) at store.c:267
#9  0x00432c93 in lazy_df_get_data (self=0xafe7100, i=<value optimized out>) at index.c:1207
#10 0x0042b1c8 in frt_lazy_df_load (self=3063423180, rkey=13439246, lazy_df=0xafe7100) at r_index.c:1949
#11 0x004ba02b in call_cfunc (func=<value optimized out>, recv=<value optimized out>, len=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>) at eval.c:5721
#12 0x004c4e66 in rb_call0 (klass=<value optimized out>, recv=<value optimized out>, id=<value optimized out>, oid=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>, body=<value optimized out>, flags=<value optimized out>) at eval.c:5861
#13 0x004c50ba in rb_call (klass=<value optimized out>, recv=<value optimized out>, mid=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>, scope=<value optimized out>, self=<value optimized out>) at eval.c:6117
#14 0x004c5e9c in vafuncall (recv=<value optimized out>, mid=<value optimized out>, n=<value optimized out>, ar=<value optimized out>) at eval.c:6194
#15 0x004c6014 in rb_funcall (recv=Could not find the frame base for "rb_funcall".
) at eval.c:6211
#16 0x004dcb3e in rb_hash_aref (hash=<value optimized out>, key=<value optimized out>) at hash.c:429
#17 0x004ba02b in call_cfunc (func=<value optimized out>, recv=<value optimized out>, len=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>) at eval.c:5721
#18 0x004c4e66 in rb_call0 (klass=<value optimized out>, recv=<value optimized out>, id=<value optimized out>, oid=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>, body=<value optimized out>, flags=<value optimized out>) at eval.c:5861
#19 0x004c50ba in rb_call (klass=<value optimized out>, recv=<value optimized out>, mid=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>, scope=<value optimized out>, self=<value optimized out>) at eval.c:6117
#20 0x004bf821 in rb_eval (self=<value optimized out>, n=<value optimized out>) at eval.c:3490
#21 0x004bf73a in rb_eval (self=<value optimized out>, n=<value optimized out>) at eval.c:3484
#22 0x004bf73a in rb_eval (self=<value optimized out>, n=<value optimized out>) at eval.c:3484
[lots of rb_eval calls]
-- 
Eduardo