From: andrew@pimlott.net (Andrew Pimlott)
Subject: [sup-talk] sup-sync and xapian memory usage
Date: Mon, 7 Sep 2009 10:04:50 -0700 [thread overview]
Message-ID: <20090907170450.GO14010@pimlott.net> (raw)
Last time I tried to use sup[1], I posted about sup-sync crashing with
various symptoms of memory exhaustion[2]. I've tried again (using git
mainline), with similar results, but now I have a bit more to say about
it.
I am running on a fairly low-memory virtual machine. I think some of
the variability in what I was seeing before had to do with what other
things were running. Sorry about not being aware of this before. In
the following, I have pretty well controlled for other system memory
use. All of these tests are done on the same mbox with all indices
cleared.
Some of the failures I see are out of memory when running gpg
("Errno::ENOMEM Exception: Cannot allocate memory - /usr/bin/gpg" ...).
I stopped at that point in the debugger and found that at this point,
the ruby backtick operator fails the same way on any command. Using
strace, I saw a failure in clone:
20528 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7cb3908) = -1 ENOMEM (Cannot allocate memory)
top reports 2.5M of physical and 30M of swap free, so I don't really
know why clone fails, but I guess there's not much you can do about
that.
Other failures were the result of sup blowing up on messages with large
attachments. sup's memory use is many times the attachment size.
Testing on an mbox with a single message with 4 ~5M (encoded size)
attachments (total file size ~21M), sup goes up to ~150M. Accounting
for baseline, that's about 6 times the file size. I hope that can be
improved. FWIW, mutt never seems to try to load a whole attachment into
memory.
Using the ferret index, the memory (virtual as reported by Linux)
behaviour of sup-sync is pretty good. It starts out 25M and levels out
around 31M. It spikes from time to time, presumably because of large
messages, but it comes right back. After processing the last message,
it uses another 10M to finish up.
Using the xapian index, things are different. It starts at 32M and
steadily climbs to 77M after ~3500 messages, or around 1M every 100
messages. It does seem to climb faster at first and then more slowly.
Either xapian is keeping a cache (but some searches suggest it doesn't),
it's leaking memory, or it's allocating memory in a way that the the
allocator can't reclaim the VM space. Any ideas?
BTW, is there really no way to ask for ruby's heap size with (unpatched)
ruby 1.8?
Andrew
[1] This is about the fourth time. I seem to be easily discouraged.
[2] http://rubyforge.org/pipermail/sup-talk/2009-May/002171.html
next reply other threads:[~2009-09-07 17:04 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-07 17:04 Andrew Pimlott [this message]
2009-09-07 18:14 ` Ben Walton
2009-09-07 18:33 ` Rich Lane
2009-09-07 19:11 ` Ben Walton
2009-09-07 19:15 ` Rich Lane
2009-09-08 13:39 ` William Morgan
2009-09-08 13:58 ` Ben Walton
2009-09-08 14:27 ` Richard Heycock
2009-09-08 15:14 ` Nicolas Pouillard
2009-09-09 6:18 ` Rich Lane
2009-09-07 19:26 ` Andrew Pimlott
2009-09-08 13:15 ` William Morgan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090907170450.GO14010@pimlott.net \
--to=andrew@pimlott.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox