From mboxrd@z Thu Jan 1 00:00:00 1970 From: ehabkost@raisama.net (Eduardo Habkost) Date: Thu, 06 Nov 2008 12:33:23 -0200 Subject: [sup-talk] EOFError crash In-Reply-To: <1225907100-sup-4816@entry> References: <1225392037-sup-9224@gillespie.rupamsunyata.org> <1225647558-sup-7206@gillespie.rupamsunyata.org> <1225907100-sup-4816@entry> Message-ID: <1225981048-sup-9011@blackpad> Excerpts from William Morgan's message of Wed Nov 05 15:48:34 -0200 2008: > Reformatted excerpts from Decklin Foster's message of 2008-11-02: > > This just happened again. Should I put it into ditz or something? (I > > feel exceedingly lame, but I don't have time to debug it today > > either.) > > No. Sadly, this is one of the innumerable Ferret errors that crop up > from time to time, which spurred STS. I've been easily reproducing crashes similar to this one. The only thing I need to reproduce it is making sure I load another label while sup is still polling for new messages. If I deliver a lot of new messages to a maildir source and don't run sup-sync, sup will spend a few seconds loading the new messages and there is plenty of time to hit L, go to a label (I don't know if it needs to be the same label the new messages being loaded are getting), and see the crash. The "workaround" I am using here is being careful to never hit L when the "polling for new messages" message is shown on the screen. I have a small collection of core files, also (6 of them, by now), all of them are from segfaults on the following line on ferret source: #6 0x00421752 in is_seek (is=0xab75220, pos=31838147) at store.c:285 285 is->m->seek_i(is, pos); Where is->m is corrupted (either 0 or a bogus value such as 0x10c0). I don't have the ruby abort message for all of them, but I remember one of them was triggered on lib/sup/index.rb, line 377 (at the 'fake_header = { ... }' stuff). Unfortunately ruby doesn't produce a ruby stack trace on segfault, so I don't know what else was running at the time of the crash (especially on the other threads). The C backtrace looks like this: #0 0x00110416 in __kernel_vsyscall () #1 0x00c76660 in raise () from /lib/libc.so.6 #2 0x00c78028 in abort () from /lib/libc.so.6 #3 0x004b6f08 in rb_bug (fmt=) at error.c:214 #4 0x00525dfb in sigsegv (sig=) at signal.c:629 #5 #6 0x00421752 in is_seek (is=0xa67f3a0, pos=24745648) at store.c:285 #7 0x003f42ea in cmpdi_read_i (is=0xafdbfa0, b=0xacda138 "\030\"?", len=170) at compound_io.c:140 #8 0x00421605 in is_read_bytes (is=0xafdbfa0, buf=0xacda138 "\030\"?", len=170) at store.c:267 #9 0x00432c93 in lazy_df_get_data (self=0xafe7100, i=) at index.c:1207 #10 0x0042b1c8 in frt_lazy_df_load (self=3063423180, rkey=13439246, lazy_df=0xafe7100) at r_index.c:1949 #11 0x004ba02b in call_cfunc (func=, recv=, len=, argc=, argv=) at eval.c:5721 #12 0x004c4e66 in rb_call0 (klass=, recv=, id=, oid=, argc=, argv=, body=, flags=) at eval.c:5861 #13 0x004c50ba in rb_call (klass=, recv=, mid=, argc=, argv=, scope=, self=) at eval.c:6117 #14 0x004c5e9c in vafuncall (recv=, mid=, n=, ar=) at eval.c:6194 #15 0x004c6014 in rb_funcall (recv=Could not find the frame base for "rb_funcall". ) at eval.c:6211 #16 0x004dcb3e in rb_hash_aref (hash=, key=) at hash.c:429 #17 0x004ba02b in call_cfunc (func=, recv=, len=, argc=, argv=) at eval.c:5721 #18 0x004c4e66 in rb_call0 (klass=, recv=, id=, oid=, argc=, argv=, body=, flags=) at eval.c:5861 #19 0x004c50ba in rb_call (klass=, recv=, mid=, argc=, argv=, scope=, self=) at eval.c:6117 #20 0x004bf821 in rb_eval (self=, n=) at eval.c:3490 #21 0x004bf73a in rb_eval (self=, n=) at eval.c:3484 #22 0x004bf73a in rb_eval (self=, n=) at eval.c:3484 [lots of rb_eval calls] -- Eduardo