Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: andrew@pimlott.net (Andrew Pimlott)
Subject: [sup-talk] sup-sync and xapian memory usage
Date: Mon, 7 Sep 2009 12:26:11 -0700	[thread overview]
Message-ID: <20090907192611.GQ14010@pimlott.net> (raw)
In-Reply-To: <1252348258-sup-415@zyrg.net>

On Mon, Sep 07, 2009 at 02:33:06PM -0400, Rich Lane wrote:
> Xapian keeps writes buffered in memory. Try setting the environment
> variable XAPIAN_FLUSH_THRESHOLD to a smaller value (the default is 10000
> documents) and see if that helps.

Thanks--it was hard for me to find that kind of information.  I first
tried setting XAPIAN_FLUSH_THRESHOLD to 1, and sup-sync ran slowly and
just kept getting slower:

## read 139m (about 7%) @ 9.2m/s. 0:00:15 elapsed, about 0:03:21 remaining
...
## read 1238m (about 35%) @ 3.1m/s. 0:06:36 elapsed, about 0:12:08 remaining

I stopped at this point because it was taking too long.  The memory use
seemed stable, but that could have been because it was making such slow
progress.  I guess xapian gets a lot slower writing as the db grows?
That's a bit discouraging.  Using ferret, sup-sync only dropped from
28.1m/s to 27.3m/s during its run.  For reference, when I didn't set
XAPIAN_FLUSH_THRESHOLD, I was getting 35-36m/s until it ran out of
memory.

I then set XAPIAN_FLUSH_THRESHOLD to 100 and got more reasonable
results.  It started at 25.6m/s and slowed to 17.8m/s.  It stabilized at
around 41M virtual memory used and finished successfuly.  I also note
that the memory use didn't jump during the finish-up phase ("Deleting
missing messages") as it had with ferret.

Finally, I set XAPIAN_FLUSH_THRESHOLD to 1000.  It started at 34.6m/s
and dropped to 29.8m/s., stabilized at around 51M virtual memory, and
finished successfully.  In this case, it stays faster than ferret, but
it sill bugs me that xapian still slows down while ferret doesn't.

So I conclude... I don't know what I conclude.  Letting xapian use a lot
of memory sure helps its performance.  And a big sup-sync should only
have to be done rarely.  So maybe just document that those on low-memory
systems should consider using XAPIAN_FLUSH_THRESHOLD during sup-sync.

Thanks again for your help!

Andrew


  parent reply	other threads:[~2009-09-07 19:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 17:04 Andrew Pimlott
2009-09-07 18:14 ` Ben Walton
2009-09-07 18:33 ` Rich Lane
2009-09-07 19:11   ` Ben Walton
2009-09-07 19:15     ` Rich Lane
2009-09-08 13:39       ` William Morgan
2009-09-08 13:58         ` Ben Walton
2009-09-08 14:27         ` Richard Heycock
2009-09-08 15:14           ` Nicolas Pouillard
2009-09-09  6:18             ` Rich Lane
2009-09-07 19:26   ` Andrew Pimlott [this message]
2009-09-08 13:15 ` William Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090907192611.GQ14010@pimlott.net \
    --to=andrew@pimlott.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox