Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: rlane@club.cc.cmu.edu (Rich Lane)
Subject: [sup-talk] [PATCH] index log
Date: Sun, 16 Aug 2009 23:38:43 -0700	[thread overview]
Message-ID: <1250491123-19240-1-git-send-email-rlane@club.cc.cmu.edu> (raw)

Add a YAML logfile that records changes to the index and modify sup-dump to use
this rather than the normal database. The log is index format/version agnostic
so that users can switch between incompatible Sup versions without running
sup-dump first.

This should also make automated backups easier.
---
 bin/sup-dump            |   19 +++++++++++++------
 lib/sup/ferret_index.rb |    7 +++++++
 lib/sup/index.rb        |   22 ++++++++++++++++++++++
 lib/sup/xapian_index.rb |    7 ++++++-
 lib/sup/yaml_log.rb     |   25 +++++++++++++++++++++++++
 5 files changed, 73 insertions(+), 7 deletions(-)
 create mode 100644 lib/sup/yaml_log.rb

diff --git a/bin/sup-dump b/bin/sup-dump
index ba36b21..531a30a 100755
--- a/bin/sup-dump
+++ b/bin/sup-dump
@@ -2,7 +2,8 @@
 
 require 'rubygems'
 require 'trollop'
-require "sup"
+require 'sup' # Redwood::VERSION, Redwood::BASE_DIR
+require "sup/yaml_log"
 
 $opts = Trollop::options do
   version "sup-dump (sup #{Redwood::VERSION})"
@@ -21,10 +22,16 @@ No options.
 EOS
 end
 
-index = Redwood::Index.new
-Redwood::SourceManager.new
-index.load
+labels = {}
 
-index.each_message :load_spam => true, :load_deleted => true, :load_killed => true do |m|
-  puts "#{m.id} (#{m.labels * ' '})"
+Redwood::log "processing index log"
+index_log = YamlLogReader.new File.join(Redwood::BASE_DIR, 'index_log.yaml')
+index_log.each do |h| 
+  case h['type']
+  when 'add_message', 'update_message_state'
+    labels[h['id']] = h['labels']
+  end
 end
+
+Redwood::log "dumping labels"
+labels.each { |msgid,labels| puts "#{msgid} (#{labels * ' '})" }
diff --git a/lib/sup/ferret_index.rb b/lib/sup/ferret_index.rb
index 98ea9b5..2cb9759 100644
--- a/lib/sup/ferret_index.rb
+++ b/lib/sup/ferret_index.rb
@@ -57,6 +57,7 @@ EOS
 
   def sync_message m, opts={}
     entry = @index[m.id]
+    existed = !entry.nil?
 
     raise "no source info for message #{m.id}" unless m.source && m.source_info
 
@@ -131,6 +132,12 @@ EOS
     }
 
     @index_mutex.synchronize do
+      if existed
+        @log.update_message_state m.id, m.labels
+      else
+        @log.add_message m.id, m.labels
+      end
+
       @index.delete m.id
       @index.add_document d
     end
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 54ec843..7360cf5 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -1,6 +1,7 @@
 ## Index interface, subclassed by Ferret indexer.
 
 require 'fileutils'
+require 'sup/yaml_log'
 
 begin
   require 'chronic'
@@ -65,6 +66,7 @@ class BaseIndex
 
   def load
     SourceManager.load_sources
+    @log = IndexLogWriter.new File.join(@dir, 'index_log.yaml')
     load_index
   end
 
@@ -176,6 +178,26 @@ class BaseIndex
   def parse_query s
     unimplemented
   end
+
+  private
+
+  class IndexLogWriter < YamlLogWriter
+    def update_message_state id, labels
+      write_entry 'update_message_state', 'id' => id, 'labels' => labels.map { |x| x.to_s }
+    end
+
+    def add_message id, labels
+      write_entry 'add_message', 'id' => id, 'labels' => labels.map { |x| x.to_s }
+    end
+
+    def remove_message id
+      write_entry 'remove_message', 'id' => id
+    end
+
+    def write_entry type, hash
+      self << hash.merge('type' => type, 'time' => Time.now)
+    end
+  end
 end
 
 index_name = ENV['SUP_INDEX'] || $config[:index] || DEFAULT_INDEX
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index 18b5050..c4dbc5f 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -61,7 +61,10 @@ class XapianIndex < BaseIndex
   end
 
   def delete id
-    synchronize { @xapian.delete_document mkterm(:msgid, id) }
+    synchronize do
+      @log.remove_message id
+      @xapian.delete_document mkterm(:msgid, id)
+    end
   end
 
   def build_message id
@@ -510,10 +513,12 @@ class XapianIndex < BaseIndex
         Redwood::log "warning: docid underflow, dropping #{m.id.inspect}"
         return
       end
+      @log.add_message m.id, m.labels
     else
       doc.clear_terms
       doc.clear_values
       docid = doc.docid
+      @log.update_message_state m.id, m.labels
     end
 
     @term_generator.document = doc
diff --git a/lib/sup/yaml_log.rb b/lib/sup/yaml_log.rb
new file mode 100644
index 0000000..325cca9
--- /dev/null
+++ b/lib/sup/yaml_log.rb
@@ -0,0 +1,25 @@
+class YamlLogReader
+  include Enumerable
+
+  def initialize filename
+    @io = File.open(filename, 'r+')
+  end
+
+  def each &b
+    @io.rewind
+    YAML.each_document @io, &b
+  end
+end
+
+class YamlLogWriter
+  def initialize filename
+    @io = File.open(filename, 'a')
+  end
+
+  def <<(o)
+    YAML.dump o, @io
+
+    ## This only flushes to the OS. We may want to fsync occasionally too.
+    @io.flush
+  end
+end
-- 
1.6.4



             reply	other threads:[~2009-08-17  6:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-17  6:38 Rich Lane [this message]
2009-08-22 13:46 ` William Morgan
2009-08-24 12:20   ` Nicolas Pouillard
2009-08-31  4:16     ` Rich Lane
2009-08-31 11:42       ` Ben Walton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1250491123-19240-1-git-send-email-rlane@club.cc.cmu.edu \
    --to=rlane@club.cc.cmu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox