[sup-talk] [PATCH 0/18] Xapian-based index

Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed

* [sup-talk] [PATCH 0/18] Xapian-based index
@ 2009-06-20 20:49 Rich Lane
  2009-06-20 20:50 ` [sup-talk] [PATCH 01/18] remove load_entry_for_id call in sup-recover-sources Rich Lane
  2009-06-24 16:30 ` [sup-talk] [PATCH 0/18] Xapian-based index William Morgan
  0 siblings, 2 replies; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:49 UTC (permalink / raw)


This patch series refactors the Index class to remove Ferret-isms and support
multiple index implementations. The included XapianIndex is a bit faster at
indexing messages and significantly faster when searching because it
precomputes thread membership. It also works on Ruby 1.9.1.

You can enable the new index with the environment variable SUP_INDEX=xapian.

It's missing a couple of features, notably threading by subject. I'm sure there
are many more bugs left, so I'd appreciate any testing or review you all can
provide.

These patches depend on the two I posted June 16: 'cleanup interface' and 'consistent naming'.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 01/18] remove load_entry_for_id call in sup-recover-sources
  2009-06-20 20:49 [sup-talk] [PATCH 0/18] Xapian-based index Rich Lane
@ 2009-06-20 20:50 ` Rich Lane
  2009-06-20 20:50   ` [sup-talk] [PATCH 02/18] remove load_entry_for_id call in DraftManager.discard Rich Lane
  2009-06-24 16:30 ` [sup-talk] [PATCH 0/18] Xapian-based index William Morgan
  1 sibling, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 bin/sup-recover-sources |   12 +++++-------
 lib/sup/index.rb        |    6 ++++++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/bin/sup-recover-sources b/bin/sup-recover-sources
index d3b1424..6e3810c 100755
--- a/bin/sup-recover-sources
+++ b/bin/sup-recover-sources
@@ -69,15 +69,14 @@ ARGV.each do |fn|
       Redwood::MBox::Loader.new(fn, nil, !$opts[:unusual], $opts[:archive])
     end
 
-  source_ids = {}
+  source_ids = Hash.new 0
   count = 0
   source.each do |offset, labels|
     m = Redwood::Message.new :source => source, :source_info => offset
-    docid, entry = index.load_entry_for_id m.id
-    next unless entry
-    #puts "# #{source} #{offset} #{entry[:source_id]}"
-
-    source_ids[entry[:source_id]] = (source_ids[entry[:source_id]] || 0) + 1
+    m.load_from_source!
+    source_id = index.source_for_id m.id
+    next unless source_id
+    source_ids[source_id] += 1
     count += 1
     break if count == $opts[:scan_num]
   end
@@ -86,7 +85,6 @@ ARGV.each do |fn|
     id = source_ids.keys.first.to_i
     puts "assigned #{source} to #{source_ids.keys.first}"
     source.id = id
-    source.seek_to! source.total
     index.add_source source
   else
     puts ">> unable to determine #{source}: #{source_ids.inspect}"
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index d15e7bb..b5d0e5d 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -494,6 +494,12 @@ EOS
     @index_mutex.synchronize { @index.optimize }
   end
 
+  def source_for_id id
+    entry = @index[id]
+    return unless entry
+    entry[:source_id].to_i
+  end
+
   class ParseError < StandardError; end
 
   ## parse a query string from the user. returns a query object
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 02/18] remove load_entry_for_id call in DraftManager.discard
  2009-06-20 20:50 ` [sup-talk] [PATCH 01/18] remove load_entry_for_id call in sup-recover-sources Rich Lane
@ 2009-06-20 20:50   ` Rich Lane
  2009-06-20 20:50     ` [sup-talk] [PATCH 03/18] remove ferret entry from poll/sync interface Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/draft.rb |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/sup/draft.rb b/lib/sup/draft.rb
index 9127739..1233945 100644
--- a/lib/sup/draft.rb
+++ b/lib/sup/draft.rb
@@ -31,14 +31,9 @@ class DraftManager
   end
 
   def discard m
-    docid, entry = Index.load_entry_for_id m.id
-    unless entry
-      Redwood::log "can't find entry for draft: #{m.id.inspect}. You probably already discarded it."
-      return
-    end
-    raise ArgumentError, "not a draft: source id #{entry[:source_id].inspect}, should be #{DraftManager.source_id.inspect} for #{m.id.inspect} / docno #{docid}" unless entry[:source_id].to_i == DraftManager.source_id
+    raise ArgumentError, "not a draft: source id #{m.source.id.inspect}, should be #{DraftManager.source_id.inspect} for #{m.id.inspect}" unless m.source.id.to_i == DraftManager.source_id
     Index.delete m.id
-    File.delete @source.fn_for_offset(entry[:source_info])
+    File.delete @source.fn_for_offset(m.source_info)
     UpdateManager.relay self, :single_message_deleted, m
   end
 end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 03/18] remove ferret entry from poll/sync interface
  2009-06-20 20:50   ` [sup-talk] [PATCH 02/18] remove load_entry_for_id call in DraftManager.discard Rich Lane
@ 2009-06-20 20:50     ` Rich Lane
  2009-06-20 20:50       ` [sup-talk] [PATCH 04/18] index: remove unused method load_entry_for_id Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


This leads to an extra index lookup in the sup-sync update path, but I think
it's worth it for the sake of API simplicity.
---
 bin/sup-sync       |    8 ++++----
 bin/sup-sync-back  |    6 +++---
 lib/sup/index.rb   |   18 ++++--------------
 lib/sup/message.rb |    6 ++++++
 lib/sup/poll.rb    |   33 ++++++++++++++-------------------
 lib/sup/sent.rb    |    2 +-
 6 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/bin/sup-sync b/bin/sup-sync
index a759cbe..18a3cab 100755
--- a/bin/sup-sync
+++ b/bin/sup-sync
@@ -137,7 +137,7 @@ begin
     num_added = num_updated = num_scanned = num_restored = 0
     last_info_time = start_time = Time.now
 
-    Redwood::PollManager.add_messages_from source, :force_overwrite => true do |m, offset, entry|
+    Redwood::PollManager.add_messages_from source, :force_overwrite => true do |m_old, m, offset|
       num_scanned += 1
       seen[m.id] = true
 
@@ -153,10 +153,10 @@ begin
       ## skip if we're operating only on changed messages, the message
       ## is in the index, and it's unchanged from what the source is
       ## reporting.
-      next if target == :changed && entry && entry[:source_id].to_i == source.id && entry[:source_info].to_i == offset
+      next if target == :changed && m_old && m_old.source.id == source.id && m_old.source_info == offset
 
       ## get the state currently in the index
-      index_state = entry[:label].symbolistize if entry
+      index_state = m_old.labels.dup if m_old
 
       ## skip if we're operating on restored messages, and this one
       ## ain't.
@@ -196,7 +196,7 @@ begin
         puts "Adding message #{source}##{offset} from #{m.from} with state {#{m.labels * ', '}}" if opts[:verbose]
         num_added += 1
       else
-        puts "Updating message #{source}##{offset}, source #{entry[:source_id]} => #{source.id}, offset #{entry[:source_info]} => #{offset}, state {#{index_state * ', '}} => {#{m.labels * ', '}}" if opts[:verbose]
+        puts "Updating message #{source}##{offset}, source #{m_old.source.id} => #{source.id}, offset #{m_old.source_info} => #{offset}, state {#{index_state * ', '}} => {#{m.labels * ', '}}" if opts[:verbose]
         num_updated += 1
       end
 
diff --git a/bin/sup-sync-back b/bin/sup-sync-back
index 4f1387e..1c746d2 100755
--- a/bin/sup-sync-back
+++ b/bin/sup-sync-back
@@ -105,11 +105,11 @@ EOS
     num_dropped = num_moved = num_scanned = 0
     
     out_fp = Tempfile.new "sup-sync-back-#{source.id}"
-    Redwood::PollManager.add_messages_from source do |m, offset, entry|
+    Redwood::PollManager.add_messages_from source do |m_old, m, offset|
       num_scanned += 1
 
-      if entry
-        labels = entry[:label].symbolistize.to_boolean_h
+      if m_old
+        labels = m_old.labels
 
         if labels.member? :deleted
           if opts[:drop_deleted]
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index b5d0e5d..89795da 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -174,16 +174,10 @@ EOS
   ## Syncs the message to the index, replacing any previous version.  adding
   ## either way. Index state will be determined by the message's #labels
   ## accessor.
-  ##
-  ## if need_load is false, docid and entry are assumed to be set to the
-  ## result of load_entry_for_id (which can be nil).
-  def sync_message m, need_load=true, docid=nil, entry=nil, opts={}
-    docid, entry = load_entry_for_id m.id if need_load
+  def sync_message m, opts={}
+    entry = @index[m.id]
 
     raise "no source info for message #{m.id}" unless m.source && m.source_info
-    @index_mutex.synchronize do
-      raise "trying to delete non-corresponding entry #{docid} with index message-id #{@index[docid][:message_id].inspect} and parameter message id #{m.id.inspect}" if docid && @index[docid][:message_id] != m.id
-    end
 
     source_id = if m.source.is_a? Integer
       m.source
@@ -256,13 +250,9 @@ EOS
     }
 
     @index_mutex.synchronize do
-      @index.delete docid if docid
+      @index.delete m.id
       @index.add_document d
     end
-
-    ## this hasn't been triggered in a long time.
-    ## docid, entry = load_entry_for_id m.id
-    ## raise "just added message #{m.id.inspect} but couldn't find it in a search" unless docid
   end
 
   def save_index fn=File.join(@dir, "ferret")
@@ -391,7 +381,7 @@ EOS
   ## builds a message object from a ferret result
   def build_message docid
     @index_mutex.synchronize do
-      doc = @index[docid]
+      doc = @index[docid] or return
 
       source = @source_mutex.synchronize { @sources[doc[:source_id].to_i] }
       raise "invalid source #{doc[:source_id]}" unless source
diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 8525fdf..b667cb3 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -288,6 +288,12 @@ EOS
        "Subject: #{@subj}"]
   end
 
+  def self.build_from_source source, source_info
+    m = Message.new :source => source, :source_info => source_info
+    m.load_from_source!
+    m
+  end
+
 private
 
   ## here's where we handle decoding mime attachments. unfortunately
diff --git a/lib/sup/poll.rb b/lib/sup/poll.rb
index 74f7d1c..bbad5f2 100644
--- a/lib/sup/poll.rb
+++ b/lib/sup/poll.rb
@@ -95,11 +95,11 @@ EOS
 
         num = 0
         numi = 0
-        add_messages_from source do |m, offset, entry|
+        add_messages_from source do |m_old, m, offset|
           ## always preserve the labels on disk.
-          m.labels = ((m.labels - [:unread, :inbox]) + entry[:label].symbolistize).uniq if entry
+          m.labels = ((m.labels - [:unread, :inbox]) + m_old.labels).uniq if m_old
           yield "Found message at #{offset} with labels {#{m.labels * ', '}}"
-          unless entry
+          unless m_old
             num += 1
             from_and_subj << [m.from && m.from.longname, m.subj]
             if m.has_label?(:inbox) && ([:spam, :deleted, :killed] & m.labels).empty?
@@ -138,29 +138,24 @@ EOS
     begin
       return if source.done? || source.has_errors?
 
-      source.each do |offset, labels|
+      source.each do |offset, default_labels|
         if source.has_errors?
           Redwood::log "error loading messages from #{source}: #{source.error.message}"
           return
         end
 
-        labels << :sent if source.uri.eql?(SentManager.source_uri)
-        labels.each { |l| LabelManager << l }
-        labels = labels + (source.archived? ? [] : [:inbox])
+        m_new = Message.build_from_source source, offset
+        m_old = Index.build_message m_new.id
 
-        m = Message.new :source => source, :source_info => offset, :labels => labels
-        m.load_from_source!
+        m_new.labels = default_labels + (source.archived? ? [] : [:inbox])
+        m_new.labels << :sent if source.uri.eql?(SentManager.source_uri)
+        m_new.labels.delete :unread if m_new.source_marked_read?
+        m_new.labels.each { |l| LabelManager << l }
 
-        if m.source_marked_read?
-          m.remove_label :unread
-          labels.delete :unread
-        end
-
-        docid, entry = Index.load_entry_for_id m.id
-        HookManager.run "before-add-message", :message => m
-        m = yield(m, offset, entry) or next if block_given?
-        times = Index.sync_message m, false, docid, entry, opts
-        UpdateManager.relay self, :added, m unless entry
+        HookManager.run "before-add-message", :message => m_new
+        m_ret = yield(m_old, m_new, offset) or next if block_given?
+        Index.sync_message m_ret, opts
+        UpdateManager.relay self, :added, m_ret unless m_old
       end
     rescue SourceError => e
       Redwood::log "problem getting messages from #{source}: #{e.message}"
diff --git a/lib/sup/sent.rb b/lib/sup/sent.rb
index e6ae856..b750d71 100644
--- a/lib/sup/sent.rb
+++ b/lib/sup/sent.rb
@@ -30,7 +30,7 @@ class SentManager
   def write_sent_message date, from_email, &block
     @source.store_message date, from_email, &block
 
-    PollManager.add_messages_from(@source) do |m, o, e|
+    PollManager.add_messages_from(@source) do |m_old, m, offset|
       m.remove_label :unread
       m
     end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 04/18] index: remove unused method load_entry_for_id
  2009-06-20 20:50     ` [sup-talk] [PATCH 03/18] remove ferret entry from poll/sync interface Rich Lane
@ 2009-06-20 20:50       ` Rich Lane
  2009-06-20 20:50         ` [sup-talk] [PATCH 05/18] switch DraftManager to use Message.build_from_source Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/index.rb |   11 -----------
 1 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 89795da..64afbdd 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -411,17 +411,6 @@ EOS
 
   def delete id; @index_mutex.synchronize { @index.delete id } end
 
-  def load_entry_for_id mid
-    @index_mutex.synchronize do
-      results = @index.search Ferret::Search::TermQuery.new(:message_id, mid)
-      return if results.total_hits == 0
-      docid = results.hits[0].doc
-      entry = @index[docid]
-      entry_dup = entry.fields.inject({}) { |h, f| h[f] = entry[f]; h }
-      [docid, entry_dup]
-    end
-  end
-
   def load_contacts emails, h={}
     q = Ferret::Search::BooleanQuery.new true
     emails.each do |e|
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 05/18] switch DraftManager to use Message.build_from_source
  2009-06-20 20:50       ` [sup-talk] [PATCH 04/18] index: remove unused method load_entry_for_id Rich Lane
@ 2009-06-20 20:50         ` Rich Lane
  2009-06-20 20:50           ` [sup-talk] [PATCH 06/18] index: move has_any_from_source_with_label? to sup-sync-back Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/draft.rb |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/lib/sup/draft.rb b/lib/sup/draft.rb
index 1233945..dd4574d 100644
--- a/lib/sup/draft.rb
+++ b/lib/sup/draft.rb
@@ -21,7 +21,8 @@ class DraftManager
 
     my_message = nil
     @source.each do |thisoffset, theselabels|
-      m = Message.new :source => @source, :source_info => thisoffset, :labels => theselabels
+      m = Message.build_from_source @source, thisoffset
+      m.labels = theselabels
       Index.sync_message m
       UpdateManager.relay self, :added, m
       my_message = m if thisoffset == offset
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 06/18] index: move has_any_from_source_with_label? to sup-sync-back
  2009-06-20 20:50         ` [sup-talk] [PATCH 05/18] switch DraftManager to use Message.build_from_source Rich Lane
@ 2009-06-20 20:50           ` Rich Lane
  2009-06-20 20:50             ` [sup-talk] [PATCH 07/18] move source-related methods to SourceManager Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 bin/sup-sync-back |    7 ++++++-
 lib/sup/index.rb  |    7 -------
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/bin/sup-sync-back b/bin/sup-sync-back
index 1c746d2..05b9e8c 100755
--- a/bin/sup-sync-back
+++ b/bin/sup-sync-back
@@ -4,6 +4,7 @@ require 'rubygems'
 require 'uri'
 require 'tempfile'
 require 'trollop'
+require 'enumerator'
 require "sup"
 
 ## save a message 'm' to an open file pointer 'fp'
@@ -14,6 +15,10 @@ def die msg
   $stderr.puts "Error: #{msg}"
   exit(-1)
 end
+def has_any_from_source_with_label? index, source, label
+  query = { :source_id => source.id, :label => label, :limit => 1 }
+  not Enumerable::Enumerator.new(index, :each_docid, query).map.empty?
+end
 
 opts = Trollop::options do
   version "sup-sync-back (sup #{Redwood::VERSION})"
@@ -96,7 +101,7 @@ EOS
   sources.each do |source|
     $stderr.puts "Scanning #{source}..."
 
-    unless ((opts[:drop_deleted] || opts[:move_deleted]) && index.has_any_from_source_with_label?(source, :deleted)) || ((opts[:drop_spam] || opts[:move_spam]) && index.has_any_from_source_with_label?(source, :spam))
+    unless ((opts[:drop_deleted] || opts[:move_deleted]) && has_any_from_source_with_label?(index, source, :deleted)) || ((opts[:drop_spam] || opts[:move_spam]) && has_any_from_source_with_label?(index, source, :spam))
       $stderr.puts "Nothing to do from this source; skipping"
       next
     end
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 64afbdd..b9f4b36 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -450,13 +450,6 @@ EOS
     end
   end
 
-  def has_any_from_source_with_label? source, label
-    q = Ferret::Search::BooleanQuery.new
-    q.add_query Ferret::Search::TermQuery.new("source_id", source.id.to_s), :must
-    q.add_query Ferret::Search::TermQuery.new("label", label.to_s), :must
-    @index_mutex.synchronize { @index.search(q, :limit => 1).total_hits > 0 }
-  end
-
   def each_docid query={}
     ferret_query = build_ferret_query query
     results = @index_mutex.synchronize { @index.search ferret_query, :limit => (query[:limit] || :all) }
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 07/18] move source-related methods to SourceManager
  2009-06-20 20:50           ` [sup-talk] [PATCH 06/18] index: move has_any_from_source_with_label? to sup-sync-back Rich Lane
@ 2009-06-20 20:50             ` Rich Lane
  2009-06-20 20:50               ` [sup-talk] [PATCH 08/18] index: remove unused method fresh_thread_id Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 bin/sup                 |   10 ++++----
 bin/sup-add             |   10 ++++----
 bin/sup-config          |   14 +++++-----
 bin/sup-recover-sources |    7 +++--
 bin/sup-sync            |    6 ++--
 bin/sup-sync-back       |    4 +-
 bin/sup-tweak-labels    |    4 +-
 lib/sup.rb              |    5 ++-
 lib/sup/index.rb        |   52 ++----------------------------------------
 lib/sup/poll.rb         |    2 +-
 lib/sup/source.rb       |   57 +++++++++++++++++++++++++++++++++++++++++++++++
 11 files changed, 92 insertions(+), 79 deletions(-)

diff --git a/bin/sup b/bin/sup
index 302ad7c..1febefd 100755
--- a/bin/sup
+++ b/bin/sup
@@ -160,17 +160,17 @@ begin
   Redwood::start
   Index.load
 
-  if(s = Index.source_for DraftManager.source_name)
+  if(s = Redwood::SourceManager.source_for DraftManager.source_name)
     DraftManager.source = s
   else
     Redwood::log "no draft source, auto-adding..."
-    Index.add_source DraftManager.new_source
+    Redwood::SourceManager.add_source DraftManager.new_source
   end
 
-  if(s = Index.source_for SentManager.source_uri)
+  if(s = Redwood::SourceManager.source_for SentManager.source_uri)
     SentManager.source = s
   else
-    Index.add_source SentManager.default_source
+    Redwood::SourceManager.add_source SentManager.default_source
   end
 
   HookManager.run "startup"
@@ -190,7 +190,7 @@ begin
 
   bm.draw_screen
 
-  Index.usual_sources.each do |s|
+  Redwood::SourceManager.usual_sources.each do |s|
     next unless s.respond_to? :connect
     reporting_thread("call #connect on #{s}") do
       begin
diff --git a/bin/sup-add b/bin/sup-add
index 50bbb29..c491ca7 100755
--- a/bin/sup-add
+++ b/bin/sup-add
@@ -82,12 +82,12 @@ index = Redwood::Index.new
 index.lock_or_die
 
 begin
-  index.load_sources
+  Redwood::SourceManager.load_sources
 
   ARGV.each do |uri|
     labels = $opts[:labels] ? $opts[:labels].split(/\s*,\s*/).uniq : []
 
-    if !$opts[:force_new] && index.source_for(uri) 
+    if !$opts[:force_new] && Redwood::SourceManager.source_for(uri) 
       say "Already know about #{uri}; skipping."
       next
     end
@@ -99,10 +99,10 @@ begin
       when "mbox+ssh"
         say "For SSH connections, if you will use public key authentication, you may leave the username and password blank."
         say ""
-        username, password = get_login_info uri, index.sources
+        username, password = get_login_info uri, Redwood::SourceManager.sources
         Redwood::MBox::SSHLoader.new uri, username, password, nil, !$opts[:unusual], $opts[:archive], nil, labels
       when "imap", "imaps"
-        username, password = get_login_info uri, index.sources
+        username, password = get_login_info uri, Redwood::SourceManager.sources
         Redwood::IMAP.new uri, username, password, nil, !$opts[:unusual], $opts[:archive], nil, labels
       when "maildir"
         Redwood::Maildir.new uri, nil, !$opts[:unusual], $opts[:archive], nil, labels
@@ -114,7 +114,7 @@ begin
         Trollop::die "Unknown source type #{parsed_uri.scheme.inspect}"      
       end
     say "Adding #{source}..."
-    index.add_source source
+    Redwood::SourceManager.add_source source
   end
 ensure
   index.save
diff --git a/bin/sup-config b/bin/sup-config
index 398197f..9fcbee6 100755
--- a/bin/sup-config
+++ b/bin/sup-config
@@ -152,7 +152,7 @@ end
 $terminal.wrap_at = :auto
 Redwood::start
 index = Redwood::Index.new
-index.load_sources
+Redwood::SourceManager.load_sources
 
 say <<EOS
 Howdy neighbor! This here's sup-config, ready to help you jack in to
@@ -191,12 +191,12 @@ $config[:editor] = editor
 done = false
 until done
   say "\nNow, we'll tell Sup where to find all your email."
-  index.load_sources
+  Redwood::SourceManager.load_sources
   say "Current sources:"
-  if index.sources.empty?
+  if Redwood::SourceManager.sources.empty?
     say "  No sources!"
   else
-    index.sources.each { |s| puts "* #{s}" }
+    Redwood::SourceManager.sources.each { |s| puts "* #{s}" }
   end
 
   say "\n"
@@ -210,8 +210,8 @@ end
 say "\nSup needs to know where to store your sent messages."
 say "Only sources capable of storing mail will be listed.\n\n"
 
-index.load_sources
-if index.sources.empty?
+Redwood::SourceManager.load_sources
+if Redwood::SourceManager.sources.empty?
   say "\nUsing the default sup://sent, since you haven't configured other sources yet."
   $config[:sent_source] = 'sup://sent'
 else
@@ -222,7 +222,7 @@ else
   choose do |menu|
     menu.prompt = "Store my sent mail in? "
 
-    valid_sents = index.sources.each do |s|
+    valid_sents = Redwood::SourceManager.sources.each do |s|
       have_sup_sent = true if s.to_s.eql?('sup://sent')
 
       menu.choice(s.to_s) { $config[:sent_source] = s.to_s } if s.respond_to? :store_message
diff --git a/bin/sup-recover-sources b/bin/sup-recover-sources
index 6e3810c..db75b11 100755
--- a/bin/sup-recover-sources
+++ b/bin/sup-recover-sources
@@ -48,13 +48,14 @@ EOS
 end.parse(ARGV)
 
 require "sup"
+Redwood::start
 puts "loading index..."
 index = Redwood::Index.new
 index.load
 puts "loaded index of #{index.size} messages"
 
 ARGV.each do |fn|
-  next if index.source_for fn
+  next if Redwood::SourceManager.source_for fn
 
   ## TODO: merge this code with the same snippet in import
   source = 
@@ -74,7 +75,7 @@ ARGV.each do |fn|
   source.each do |offset, labels|
     m = Redwood::Message.new :source => source, :source_info => offset
     m.load_from_source!
-    source_id = index.source_for_id m.id
+    source_id = Redwood::SourceManager.source_for_id m.id
     next unless source_id
     source_ids[source_id] += 1
     count += 1
@@ -85,7 +86,7 @@ ARGV.each do |fn|
     id = source_ids.keys.first.to_i
     puts "assigned #{source} to #{source_ids.keys.first}"
     source.id = id
-    index.add_source source
+    Redwood::SourceManager.add_source source
   else
     puts ">> unable to determine #{source}: #{source_ids.inspect}"
   end
diff --git a/bin/sup-sync b/bin/sup-sync
index 18a3cab..270524a 100755
--- a/bin/sup-sync
+++ b/bin/sup-sync
@@ -116,11 +116,11 @@ begin
   index.load
 
   sources = ARGV.map do |uri|
-    index.source_for uri or Trollop::die "Unknown source: #{uri}. Did you add it with sup-add first?"
+    Redwood::SourceManager.source_for uri or Trollop::die "Unknown source: #{uri}. Did you add it with sup-add first?"
   end
   
-  sources = index.usual_sources if sources.empty?
-  sources = index.sources if opts[:all_sources]
+  sources = Redwood::SourceManager.usual_sources if sources.empty?
+  sources = Redwood::SourceManager.sources if opts[:all_sources]
 
   unless target == :new
     if opts[:start_at]
diff --git a/bin/sup-sync-back b/bin/sup-sync-back
index 05b9e8c..679e03a 100755
--- a/bin/sup-sync-back
+++ b/bin/sup-sync-back
@@ -80,13 +80,13 @@ begin
   index.load
 
   sources = ARGV.map do |uri|
-    s = index.source_for(uri) or die "unknown source: #{uri}. Did you add it with sup-add first?"
+    s = Redwood::SourceManager.source_for(uri) or die "unknown source: #{uri}. Did you add it with sup-add first?"
     s.is_a?(Redwood::MBox::Loader) or die "#{uri} is not an mbox source."
     s
   end
 
   if sources.empty?
-    sources = index.usual_sources.select { |s| s.is_a? Redwood::MBox::Loader } 
+    sources = Redwood::SourceManager.usual_sources.select { |s| s.is_a? Redwood::MBox::Loader } 
   end
 
   unless sources.all? { |s| s.file_path.nil? } || File.executable?(dotlockfile) || opts[:dont_use_dotlockfile]
diff --git a/bin/sup-tweak-labels b/bin/sup-tweak-labels
index 6f603e2..95a3b03 100755
--- a/bin/sup-tweak-labels
+++ b/bin/sup-tweak-labels
@@ -66,10 +66,10 @@ begin
 
   source_ids = 
     if opts[:all_sources]
-      index.sources
+      Redwood::SourceManager.sources
     else
       ARGV.map do |uri|
-        index.source_for uri or Trollop::die "Unknown source: #{uri}. Did you add it with sup-add first?"
+        Redwood::SourceManager.source_for uri or Trollop::die "Unknown source: #{uri}. Did you add it with sup-add first?"
       end
   end.map { |s| s.id }
   Trollop::die "nothing to do: no sources" if source_ids.empty?
diff --git a/lib/sup.rb b/lib/sup.rb
index 8373820..5689c2b 100644
--- a/lib/sup.rb
+++ b/lib/sup.rb
@@ -115,6 +115,7 @@ module Redwood
     Redwood::SuicideManager.new Redwood::SUICIDE_FN
     Redwood::CryptoManager.new
     Redwood::UndoManager.new
+    Redwood::SourceManager.new
   end
 
   def finish
@@ -130,7 +131,7 @@ module Redwood
   def report_broken_sources opts={}
     return unless BufferManager.instantiated?
 
-    broken_sources = Index.sources.select { |s| s.error.is_a? FatalSourceError }
+    broken_sources = SourceManager.sources.select { |s| s.error.is_a? FatalSourceError }
     unless broken_sources.empty?
       BufferManager.spawn_unless_exists("Broken source notification for #{broken_sources.join(',')}", opts) do
         TextMode.new(<<EOM)
@@ -147,7 +148,7 @@ EOM
       end
     end
 
-    desynced_sources = Index.sources.select { |s| s.error.is_a? OutOfSyncSourceError }
+    desynced_sources = SourceManager.sources.select { |s| s.error.is_a? OutOfSyncSourceError }
     unless desynced_sources.empty?
       BufferManager.spawn_unless_exists("Out-of-sync source notification for #{broken_sources.join(',')}", opts) do
         TextMode.new(<<EOM)
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index b9f4b36..7d6258d 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -26,11 +26,7 @@ class Index
 
   def initialize dir=BASE_DIR
     @index_mutex = Monitor.new
-
     @dir = dir
-    @sources = {}
-    @sources_dirty = false
-    @source_mutex = Monitor.new
 
     wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false
     sa = Ferret::Analysis::StandardAnalyzer.new [], true
@@ -112,36 +108,17 @@ EOS
   end
 
   def load
-    load_sources
+    SourceManager.load_sources
     load_index
   end
 
   def save
     Redwood::log "saving index and sources..."
     FileUtils.mkdir_p @dir unless File.exists? @dir
-    save_sources
+    SourceManager.save_sources
     save_index
   end
 
-  def add_source source
-    @source_mutex.synchronize do
-      raise "duplicate source!" if @sources.include? source
-      @sources_dirty = true
-      max = @sources.max_of { |id, s| s.is_a?(DraftLoader) || s.is_a?(SentLoader) ? 0 : id }
-      source.id ||= (max || 0) + 1
-      ##source.id += 1 while @sources.member? source.id
-      @sources[source.id] = source
-    end
-  end
-
-  def sources
-    ## favour the inbox by listing non-archived sources first
-    @source_mutex.synchronize { @sources.values }.sort_by { |s| s.id }.partition { |s| !s.archived? }.flatten
-  end
-
-  def source_for uri; sources.find { |s| s.is_source_for? uri }; end
-  def usual_sources; sources.find_all { |s| s.usual? }; end
-
   def load_index dir=File.join(@dir, "ferret")
     if File.exists? dir
       Redwood::log "loading index..."
@@ -383,7 +360,7 @@ EOS
     @index_mutex.synchronize do
       doc = @index[docid] or return
 
-      source = @source_mutex.synchronize { @sources[doc[:source_id].to_i] }
+      source = SourceManager[doc[:source_id].to_i]
       raise "invalid source #{doc[:source_id]}" unless source
 
       #puts "building message #{doc[:message_id]} (#{source}##{doc[:source_info]})"
@@ -442,14 +419,6 @@ EOS
     contacts.keys.compact
   end
 
-  def load_sources fn=Redwood::SOURCE_FN
-    source_array = (Redwood::load_yaml_obj(fn) || []).map { |o| Recoverable.new o }
-    @source_mutex.synchronize do
-      @sources = Hash[*(source_array).map { |s| [s.id, s] }.flatten]
-      @sources_dirty = false
-    end
-  end
-
   def each_docid query={}
     ferret_query = build_ferret_query query
     results = @index_mutex.synchronize { @index.search ferret_query, :limit => (query[:limit] || :all) }
@@ -604,21 +573,6 @@ private
     q.add_query Ferret::Search::TermQuery.new("source_id", query[:source_id]), :must if query[:source_id]
     q
   end
-
-  def save_sources fn=Redwood::SOURCE_FN
-    @source_mutex.synchronize do
-      if @sources_dirty || @sources.any? { |id, s| s.dirty? }
-        bakfn = fn + ".bak"
-        if File.exists? fn
-          File.chmod 0600, fn
-          FileUtils.mv fn, bakfn, :force => true unless File.exists?(bakfn) && File.size(fn) == 0
-        end
-        Redwood::save_yaml_obj sources.sort_by { |s| s.id.to_i }, fn, true
-        File.chmod 0600, fn
-      end
-      @sources_dirty = false
-    end
-  end
 end
 
 end
diff --git a/lib/sup/poll.rb b/lib/sup/poll.rb
index bbad5f2..c83290c 100644
--- a/lib/sup/poll.rb
+++ b/lib/sup/poll.rb
@@ -83,7 +83,7 @@ EOS
     from_and_subj_inbox = []
 
     @mutex.synchronize do
-      Index.usual_sources.each do |source|
+      SourceManager.usual_sources.each do |source|
 #        yield "source #{source} is done? #{source.done?} (cur_offset #{source.cur_offset} >= #{source.end_offset})"
         begin
           yield "Loading from #{source}... " unless source.done? || (source.respond_to?(:has_errors?) && source.has_errors?)
diff --git a/lib/sup/source.rb b/lib/sup/source.rb
index fb98dbc..1bb7797 100644
--- a/lib/sup/source.rb
+++ b/lib/sup/source.rb
@@ -155,4 +155,61 @@ protected
   end
 end
 
+class SourceManager
+  include Singleton
+
+  def initialize
+    @sources = {}
+    @sources_dirty = false
+    @source_mutex = Monitor.new
+    self.class.i_am_the_instance self
+  end
+
+  def [](id)
+    @source_mutex.synchronize { @sources[id] }
+  end
+
+  def add_source source
+    @source_mutex.synchronize do
+      raise "duplicate source!" if @sources.include? source
+      @sources_dirty = true
+      max = @sources.max_of { |id, s| s.is_a?(DraftLoader) || s.is_a?(SentLoader) ? 0 : id }
+      source.id ||= (max || 0) + 1
+      ##source.id += 1 while @sources.member? source.id
+      @sources[source.id] = source
+    end
+  end
+
+  def sources
+    ## favour the inbox by listing non-archived sources first
+    @source_mutex.synchronize { @sources.values }.sort_by { |s| s.id }.partition { |s| !s.archived? }.flatten
+  end
+
+  def source_for uri; sources.find { |s| s.is_source_for? uri }; end
+  def usual_sources; sources.find_all { |s| s.usual? }; end
+
+  def load_sources fn=Redwood::SOURCE_FN
+    source_array = (Redwood::load_yaml_obj(fn) || []).map { |o| Recoverable.new o }
+    @source_mutex.synchronize do
+      @sources = Hash[*(source_array).map { |s| [s.id, s] }.flatten]
+      @sources_dirty = false
+    end
+  end
+
+  def save_sources fn=Redwood::SOURCE_FN
+    @source_mutex.synchronize do
+      if @sources_dirty || @sources.any? { |id, s| s.dirty? }
+        bakfn = fn + ".bak"
+        if File.exists? fn
+          File.chmod 0600, fn
+          FileUtils.mv fn, bakfn, :force => true unless File.exists?(bakfn) && File.size(fn) == 0
+        end
+        Redwood::save_yaml_obj sources.sort_by { |s| s.id.to_i }, fn, true
+        File.chmod 0600, fn
+      end
+      @sources_dirty = false
+    end
+  end
+end
+
 end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 08/18] index: remove unused method fresh_thread_id
  2009-06-20 20:50             ` [sup-talk] [PATCH 07/18] move source-related methods to SourceManager Rich Lane
@ 2009-06-20 20:50               ` Rich Lane
  2009-06-20 20:50                 ` [sup-talk] [PATCH 09/18] index: revert overeager opts->query rename in each_message_in_thread_for Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/index.rb |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 7d6258d..e3f9e69 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -382,7 +382,6 @@ EOS
     end
   end
 
-  def fresh_thread_id; @next_thread_id += 1; end
   def wrap_subj subj; "__START_SUBJECT__ #{subj} __END_SUBJECT__"; end
   def unwrap_subj subj; subj =~ /__START_SUBJECT__ (.*?) __END_SUBJECT__/ && $1; end
 
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 09/18] index: revert overeager opts->query rename in each_message_in_thread_for
  2009-06-20 20:50               ` [sup-talk] [PATCH 08/18] index: remove unused method fresh_thread_id Rich Lane
@ 2009-06-20 20:50                 ` Rich Lane
  2009-06-20 20:50                   ` [sup-talk] [PATCH 10/18] index: make wrap_subj methods private Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/index.rb |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index e3f9e69..080a4ec 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -280,7 +280,7 @@ EOS
   ## is found.
   SAME_SUBJECT_DATE_LIMIT = 7
   MAX_CLAUSES = 1000
-  def each_message_in_thread_for m, query={}
+  def each_message_in_thread_for m, opts={}
     #Redwood::log "Building thread for #{m.id}: #{m.subj}"
     messages = {}
     searched = {}
@@ -310,7 +310,7 @@ EOS
       pending = (pending + p1 + p2).uniq
     end
 
-    until pending.empty? || (query[:limit] && messages.size >= query[:limit])
+    until pending.empty? || (opts[:limit] && messages.size >= opts[:limit])
       q = Ferret::Search::BooleanQuery.new true
       # this disappeared in newer ferrets... wtf.
       # q.max_clause_count = 2048
@@ -329,8 +329,8 @@ EOS
       killed = false
       @index_mutex.synchronize do
         @index.search_each(q, :limit => :all) do |docid, score|
-          break if query[:limit] && messages.size >= query[:limit]
-          if @index[docid][:label].split(/\s+/).include?("killed") && query[:skip_killed]
+          break if opts[:limit] && messages.size >= opts[:limit]
+          if @index[docid][:label].split(/\s+/).include?("killed") && opts[:skip_killed]
             killed = true
             break
           end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 10/18] index: make wrap_subj methods private
  2009-06-20 20:50                 ` [sup-talk] [PATCH 09/18] index: revert overeager opts->query rename in each_message_in_thread_for Rich Lane
@ 2009-06-20 20:50                   ` Rich Lane
  2009-06-20 20:50                     ` [sup-talk] [PATCH 11/18] index: move Ferret-specific code to ferret_index.rb Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/index.rb |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 080a4ec..5ddd6ee 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -382,9 +382,6 @@ EOS
     end
   end
 
-  def wrap_subj subj; "__START_SUBJECT__ #{subj} __END_SUBJECT__"; end
-  def unwrap_subj subj; subj =~ /__START_SUBJECT__ (.*?) __END_SUBJECT__/ && $1; end
-
   def delete id; @index_mutex.synchronize { @index.delete id } end
 
   def load_contacts emails, h={}
@@ -572,6 +569,9 @@ private
     q.add_query Ferret::Search::TermQuery.new("source_id", query[:source_id]), :must if query[:source_id]
     q
   end
+
+  def wrap_subj subj; "__START_SUBJECT__ #{subj} __END_SUBJECT__"; end
+  def unwrap_subj subj; subj =~ /__START_SUBJECT__ (.*?) __END_SUBJECT__/ && $1; end
 end
 
 end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 11/18] index: move Ferret-specific code to ferret_index.rb
  2009-06-20 20:50                   ` [sup-talk] [PATCH 10/18] index: make wrap_subj methods private Rich Lane
@ 2009-06-20 20:50                     ` Rich Lane
  2009-06-20 20:50                       ` [sup-talk] [PATCH 12/18] remove last external uses of ferret docid Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/ferret_index.rb |  463 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/sup/index.rb        |  453 +++++-----------------------------------------
 2 files changed, 509 insertions(+), 407 deletions(-)
 create mode 100644 lib/sup/ferret_index.rb

diff --git a/lib/sup/ferret_index.rb b/lib/sup/ferret_index.rb
new file mode 100644
index 0000000..53c19e0
--- /dev/null
+++ b/lib/sup/ferret_index.rb
@@ -0,0 +1,463 @@
+require 'ferret'
+
+module Redwood
+
+class FerretIndex < BaseIndex
+
+  def initialize dir=BASE_DIR
+    super
+
+    @index_mutex = Monitor.new
+    wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false
+    sa = Ferret::Analysis::StandardAnalyzer.new [], true
+    @analyzer = Ferret::Analysis::PerFieldAnalyzer.new wsa
+    @analyzer[:body] = sa
+    @analyzer[:subject] = sa
+    @qparser ||= Ferret::QueryParser.new :default_field => :body, :analyzer => @analyzer, :or_default => false
+  end
+
+  def load_index dir=File.join(@dir, "ferret")
+    if File.exists? dir
+      Redwood::log "loading index..."
+      @index_mutex.synchronize do
+        @index = Ferret::Index::Index.new(:path => dir, :analyzer => @analyzer, :id_field => 'message_id')
+        Redwood::log "loaded index of #{@index.size} messages"
+      end
+    else
+      Redwood::log "creating index..."
+      @index_mutex.synchronize do
+        field_infos = Ferret::Index::FieldInfos.new :store => :yes
+        field_infos.add_field :message_id, :index => :untokenized
+        field_infos.add_field :source_id
+        field_infos.add_field :source_info
+        field_infos.add_field :date, :index => :untokenized
+        field_infos.add_field :body
+        field_infos.add_field :label
+        field_infos.add_field :attachments
+        field_infos.add_field :subject
+        field_infos.add_field :from
+        field_infos.add_field :to
+        field_infos.add_field :refs
+        field_infos.add_field :snippet, :index => :no, :term_vector => :no
+        field_infos.create_index dir
+        @index = Ferret::Index::Index.new(:path => dir, :analyzer => @analyzer, :id_field => 'message_id')
+      end
+    end
+  end
+
+  def sync_message m, opts={}
+    entry = @index[m.id]
+
+    raise "no source info for message #{m.id}" unless m.source && m.source_info
+
+    source_id = if m.source.is_a? Integer
+      m.source
+    else
+      m.source.id or raise "unregistered source #{m.source} (id #{m.source.id.inspect})"
+    end
+
+    snippet = if m.snippet_contains_encrypted_content? && $config[:discard_snippets_from_encrypted_messages]
+      ""
+    else
+      m.snippet
+    end
+
+    ## write the new document to the index. if the entry already exists in the
+    ## index, reuse it (which avoids having to reload the entry from the source,
+    ## which can be quite expensive for e.g. large threads of IMAP actions.)
+    ##
+    ## exception: if the index entry belongs to an earlier version of the
+    ## message, use everything from the new message instead, but union the
+    ## flags. this allows messages sent to mailing lists to have their header
+    ## updated and to have flags set properly.
+    ##
+    ## minor hack: messages in sources with lower ids have priority over
+    ## messages in sources with higher ids. so messages in the inbox will
+    ## override everyone, and messages in the sent box will be overridden
+    ## by everyone else.
+    ##
+    ## written in this manner to support previous versions of the index which
+    ## did not keep around the entry body. upgrading is thus seamless.
+    entry ||= {}
+    labels = m.labels.uniq # override because this is the new state, unless...
+
+    ## if we are a later version of a message, ignore what's in the index,
+    ## but merge in the labels.
+    if entry[:source_id] && entry[:source_info] && entry[:label] &&
+      ((entry[:source_id].to_i > source_id) || (entry[:source_info].to_i < m.source_info))
+      labels = (entry[:label].symbolistize + m.labels).uniq
+      #Redwood::log "found updated version of message #{m.id}: #{m.subj}"
+      #Redwood::log "previous version was at #{entry[:source_id].inspect}:#{entry[:source_info].inspect}, this version at #{source_id.inspect}:#{m.source_info.inspect}"
+      #Redwood::log "merged labels are #{labels.inspect} (index #{entry[:label].inspect}, message #{m.labels.inspect})"
+      entry = {}
+    end
+
+    ## if force_overwite is true, ignore what's in the index. this is used
+    ## primarily by sup-sync to force index updates.
+    entry = {} if opts[:force_overwrite]
+
+    d = {
+      :message_id => m.id,
+      :source_id => source_id,
+      :source_info => m.source_info,
+      :date => (entry[:date] || m.date.to_indexable_s),
+      :body => (entry[:body] || m.indexable_content),
+      :snippet => snippet, # always override
+      :label => labels.uniq.join(" "),
+      :attachments => (entry[:attachments] || m.attachments.uniq.join(" ")),
+
+      ## always override :from and :to.
+      ## older versions of Sup would often store the wrong thing in the index
+      ## (because they were canonicalizing email addresses, resulting in the
+      ## wrong name associated with each.) the correct address is read from
+      ## the original header when these messages are opened in thread-view-mode,
+      ## so this allows people to forcibly update the address in the index by
+      ## marking those threads for saving.
+      :from => (m.from ? m.from.indexable_content : ""),
+      :to => (m.to + m.cc + m.bcc).map { |x| x.indexable_content }.join(" "),
+
+      :subject => (entry[:subject] || wrap_subj(Message.normalize_subj(m.subj))),
+      :refs => (entry[:refs] || (m.refs + m.replytos).uniq.join(" ")),
+    }
+
+    @index_mutex.synchronize do
+      @index.delete m.id
+      @index.add_document d
+    end
+  end
+
+  def save_index fn=File.join(@dir, "ferret")
+    # don't have to do anything, apparently
+  end
+
+  def contains_id? id
+    @index_mutex.synchronize { @index.search(Ferret::Search::TermQuery.new(:message_id, id)).total_hits > 0 }
+  end
+
+  def size
+    @index_mutex.synchronize { @index.size }
+  end
+
+  EACH_BY_DATE_NUM = 100
+  def each_id_by_date query={}
+    return if empty? # otherwise ferret barfs ###TODO: remove this once my ferret patch is accepted
+    ferret_query = build_ferret_query query
+    offset = 0
+    while true
+      limit = (query[:limit])? [EACH_BY_DATE_NUM, query[:limit] - offset].min : EACH_BY_DATE_NUM
+      results = @index_mutex.synchronize { @index.search ferret_query, :sort => "date DESC", :limit => limit, :offset => offset }
+      Redwood::log "got #{results.total_hits} results for query (offset #{offset}) #{ferret_query.inspect}"
+      results.hits.each do |hit|
+        yield @index_mutex.synchronize { @index[hit.doc][:message_id] }, lambda { build_message hit.doc }
+      end
+      break if query[:limit] and offset >= query[:limit] - limit
+      break if offset >= results.total_hits - limit
+      offset += limit
+    end
+  end
+
+  def num_results_for query={}
+    return 0 if empty? # otherwise ferret barfs ###TODO: remove this once my ferret patch is accepted
+    ferret_query = build_ferret_query query
+    @index_mutex.synchronize { @index.search(ferret_query, :limit => 1).total_hits }
+  end
+
+  SAME_SUBJECT_DATE_LIMIT = 7
+  MAX_CLAUSES = 1000
+  def each_message_in_thread_for m, opts={}
+    #Redwood::log "Building thread for #{m.id}: #{m.subj}"
+    messages = {}
+    searched = {}
+    num_queries = 0
+
+    pending = [m.id]
+    if $config[:thread_by_subject] # do subject queries
+      date_min = m.date - (SAME_SUBJECT_DATE_LIMIT * 12 * 3600)
+      date_max = m.date + (SAME_SUBJECT_DATE_LIMIT * 12 * 3600)
+
+      q = Ferret::Search::BooleanQuery.new true
+      sq = Ferret::Search::PhraseQuery.new(:subject)
+      wrap_subj(Message.normalize_subj(m.subj)).split.each do |t|
+        sq.add_term t
+      end
+      q.add_query sq, :must
+      q.add_query Ferret::Search::RangeQuery.new(:date, :>= => date_min.to_indexable_s, :<= => date_max.to_indexable_s), :must
+
+      q = build_ferret_query :qobj => q
+
+      p1 = @index_mutex.synchronize { @index.search(q).hits.map { |hit| @index[hit.doc][:message_id] } }
+      Redwood::log "found #{p1.size} results for subject query #{q}"
+
+      p2 = @index_mutex.synchronize { @index.search(q.to_s, :limit => :all).hits.map { |hit| @index[hit.doc][:message_id] } }
+      Redwood::log "found #{p2.size} results in string form"
+
+      pending = (pending + p1 + p2).uniq
+    end
+
+    until pending.empty? || (opts[:limit] && messages.size >= opts[:limit])
+      q = Ferret::Search::BooleanQuery.new true
+      # this disappeared in newer ferrets... wtf.
+      # q.max_clause_count = 2048
+
+      lim = [MAX_CLAUSES / 2, pending.length].min
+      pending[0 ... lim].each do |id|
+        searched[id] = true
+        q.add_query Ferret::Search::TermQuery.new(:message_id, id), :should
+        q.add_query Ferret::Search::TermQuery.new(:refs, id), :should
+      end
+      pending = pending[lim .. -1]
+
+      q = build_ferret_query :qobj => q
+
+      num_queries += 1
+      killed = false
+      @index_mutex.synchronize do
+        @index.search_each(q, :limit => :all) do |docid, score|
+          break if opts[:limit] && messages.size >= opts[:limit]
+          if @index[docid][:label].split(/\s+/).include?("killed") && opts[:skip_killed]
+            killed = true
+            break
+          end
+          mid = @index[docid][:message_id]
+          unless messages.member?(mid)
+            #Redwood::log "got #{mid} as a child of #{id}"
+            messages[mid] ||= lambda { build_message docid }
+            refs = @index[docid][:refs].split
+            pending += refs.select { |id| !searched[id] }
+          end
+        end
+      end
+    end
+
+    if killed
+      Redwood::log "thread for #{m.id} is killed, ignoring"
+      false
+    else
+      Redwood::log "ran #{num_queries} queries to build thread of #{messages.size} messages for #{m.id}: #{m.subj}" if num_queries > 0
+      messages.each { |mid, builder| yield mid, builder }
+      true
+    end
+  end
+
+  ## builds a message object from a ferret result
+  def build_message docid
+    @index_mutex.synchronize do
+      doc = @index[docid] or return
+
+      source = SourceManager[doc[:source_id].to_i]
+      raise "invalid source #{doc[:source_id]}" unless source
+
+      #puts "building message #{doc[:message_id]} (#{source}##{doc[:source_info]})"
+
+      fake_header = {
+        "date" => Time.at(doc[:date].to_i),
+        "subject" => unwrap_subj(doc[:subject]),
+        "from" => doc[:from],
+        "to" => doc[:to].split.join(", "), # reformat
+        "message-id" => doc[:message_id],
+        "references" => doc[:refs].split.map { |x| "<#{x}>" }.join(" "),
+      }
+
+      m = Message.new :source => source, :source_info => doc[:source_info].to_i,
+                  :labels => doc[:label].symbolistize,
+                  :snippet => doc[:snippet]
+      m.parse_header fake_header
+      m
+    end
+  end
+
+  def delete id
+    @index_mutex.synchronize { @index.delete id }
+  end
+
+  def load_contacts emails, h={}
+    q = Ferret::Search::BooleanQuery.new true
+    emails.each do |e|
+      qq = Ferret::Search::BooleanQuery.new true
+      qq.add_query Ferret::Search::TermQuery.new(:from, e), :should
+      qq.add_query Ferret::Search::TermQuery.new(:to, e), :should
+      q.add_query qq
+    end
+    q.add_query Ferret::Search::TermQuery.new(:label, "spam"), :must_not
+    
+    Redwood::log "contact search: #{q}"
+    contacts = {}
+    num = h[:num] || 20
+    @index_mutex.synchronize do
+      @index.search_each q, :sort => "date DESC", :limit => :all do |docid, score|
+        break if contacts.size >= num
+        #Redwood::log "got message #{docid} to: #{@index[docid][:to].inspect} and from: #{@index[docid][:from].inspect}"
+        f = @index[docid][:from]
+        t = @index[docid][:to]
+
+        if AccountManager.is_account_email? f
+          t.split(" ").each { |e| contacts[Person.from_address(e)] = true }
+        else
+          contacts[Person.from_address(f)] = true
+        end
+      end
+    end
+
+    contacts.keys.compact
+  end
+
+  def each_docid query={}
+    ferret_query = build_ferret_query query
+    results = @index_mutex.synchronize { @index.search ferret_query, :limit => (query[:limit] || :all) }
+    results.hits.map { |hit| yield hit.doc }
+  end
+    
+  def each_message query={}
+    each_docid query do |docid|
+      yield build_message(docid)
+    end
+  end
+
+  def optimize
+    @index_mutex.synchronize { @index.optimize }
+  end
+
+  def source_for_id id
+    entry = @index[id]
+    return unless entry
+    entry[:source_id].to_i
+  end
+
+  class ParseError < StandardError; end
+
+  ## parse a query string from the user. returns a query object
+  ## that can be passed to any index method with a 'query' 
+  ## argument, as well as build_ferret_query.
+  ##
+  ## raises a ParseError if something went wrong.
+  def parse_query s
+    query = {}
+
+    subs = s.gsub(/\b(to|from):(\S+)\b/) do
+      field, name = $1, $2
+      if(p = ContactManager.contact_for(name))
+        [field, p.email]
+      elsif name == "me"
+        [field, "(" + AccountManager.user_emails.join("||") + ")"]
+      else
+        [field, name]
+      end.join(":")
+    end
+
+    ## if we see a label:deleted or a label:spam term anywhere in the query
+    ## string, we set the extra load_spam or load_deleted options to true.
+    ## bizarre? well, because the query allows arbitrary parenthesized boolean
+    ## expressions, without fully parsing the query, we can't tell whether
+    ## the user is explicitly directing us to search spam messages or not.
+    ## e.g. if the string is -(-(-(-(-label:spam)))), does the user want to
+    ## search spam messages or not?
+    ##
+    ## so, we rely on the fact that turning these extra options ON turns OFF
+    ## the adding of "-label:deleted" or "-label:spam" terms at the very
+    ## final stage of query processing. if the user wants to search spam
+    ## messages, not adding that is the right thing; if he doesn't want to
+    ## search spam messages, then not adding it won't have any effect.
+    query[:load_spam] = true if subs =~ /\blabel:spam\b/
+    query[:load_deleted] = true if subs =~ /\blabel:deleted\b/
+
+    ## gmail style "is" operator
+    subs = subs.gsub(/\b(is|has):(\S+)\b/) do
+      field, label = $1, $2
+      case label
+      when "read"
+        "-label:unread"
+      when "spam"
+        query[:load_spam] = true
+        "label:spam"
+      when "deleted"
+        query[:load_deleted] = true
+        "label:deleted"
+      else
+        "label:#{$2}"
+      end
+    end
+
+    ## gmail style attachments "filename" and "filetype" searches
+    subs = subs.gsub(/\b(filename|filetype):(\((.+?)\)\B|(\S+)\b)/) do
+      field, name = $1, ($3 || $4)
+      case field
+      when "filename"
+        Redwood::log "filename - translated #{field}:#{name} to attachments:(#{name.downcase})"
+        "attachments:(#{name.downcase})"
+      when "filetype"
+        Redwood::log "filetype - translated #{field}:#{name} to attachments:(*.#{name.downcase})"
+        "attachments:(*.#{name.downcase})"
+      end
+    end
+
+    if $have_chronic
+      subs = subs.gsub(/\b(before|on|in|during|after):(\((.+?)\)\B|(\S+)\b)/) do
+        field, datestr = $1, ($3 || $4)
+        realdate = Chronic.parse datestr, :guess => false, :context => :past
+        if realdate
+          case field
+          when "after"
+            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate.end}"
+            "date:(>= #{sprintf "%012d", realdate.end.to_i})"
+          when "before"
+            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate.begin}"
+            "date:(<= #{sprintf "%012d", realdate.begin.to_i})"
+          else
+            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate}"
+            "date:(<= #{sprintf "%012d", realdate.end.to_i}) date:(>= #{sprintf "%012d", realdate.begin.to_i})"
+          end
+        else
+          raise ParseError, "can't understand date #{datestr.inspect}"
+        end
+      end
+    end
+
+    ## limit:42 restrict the search to 42 results
+    subs = subs.gsub(/\blimit:(\S+)\b/) do
+      lim = $1
+      if lim =~ /^\d+$/
+        query[:limit] = lim.to_i
+        ''
+      else
+        raise ParseError, "non-numeric limit #{lim.inspect}"
+      end
+    end
+    
+    begin
+      query[:qobj] = @qparser.parse(subs)
+      query[:text] = s
+      query
+    rescue Ferret::QueryParser::QueryParseException => e
+      raise ParseError, e.message
+    end
+  end
+
+private
+
+  def build_ferret_query query
+    q = Ferret::Search::BooleanQuery.new
+    q.add_query query[:qobj], :must if query[:qobj]
+    labels = ([query[:label]] + (query[:labels] || [])).compact
+    labels.each { |t| q.add_query Ferret::Search::TermQuery.new("label", t.to_s), :must }
+    if query[:participants]
+      q2 = Ferret::Search::BooleanQuery.new
+      query[:participants].each do |p|
+        q2.add_query Ferret::Search::TermQuery.new("from", p.email), :should
+        q2.add_query Ferret::Search::TermQuery.new("to", p.email), :should
+      end
+      q.add_query q2, :must
+    end
+        
+    q.add_query Ferret::Search::TermQuery.new("label", "spam"), :must_not unless query[:load_spam] || labels.include?(:spam)
+    q.add_query Ferret::Search::TermQuery.new("label", "deleted"), :must_not unless query[:load_deleted] || labels.include?(:deleted)
+    q.add_query Ferret::Search::TermQuery.new("label", "killed"), :must_not if query[:skip_killed]
+
+    q.add_query Ferret::Search::TermQuery.new("source_id", query[:source_id]), :must if query[:source_id]
+    q
+  end
+
+  def wrap_subj subj; "__START_SUBJECT__ #{subj} __END_SUBJECT__"; end
+  def unwrap_subj subj; subj =~ /__START_SUBJECT__ (.*?) __END_SUBJECT__/ && $1; end
+end
+
+end
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 5ddd6ee..be0e870 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -1,7 +1,6 @@
-## the index structure for redwood. interacts with ferret.
+## Index interface, subclassed by Ferret indexer.
 
 require 'fileutils'
-require 'ferret'
 
 begin
   require 'chronic'
@@ -13,7 +12,7 @@ end
 
 module Redwood
 
-class Index
+class BaseIndex
   class LockError < StandardError
     def initialize h
       @h = h
@@ -25,17 +24,8 @@ class Index
   include Singleton
 
   def initialize dir=BASE_DIR
-    @index_mutex = Monitor.new
     @dir = dir
-
-    wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false
-    sa = Ferret::Analysis::StandardAnalyzer.new [], true
-    @analyzer = Ferret::Analysis::PerFieldAnalyzer.new wsa
-    @analyzer[:body] = sa
-    @analyzer[:subject] = sa
-    @qparser ||= Ferret::QueryParser.new :default_field => :body, :analyzer => @analyzer, :or_default => false
     @lock = Lockfile.new lockfile, :retries => 0, :max_age => nil
-
     self.class.i_am_the_instance self
   end
 
@@ -119,155 +109,44 @@ EOS
     save_index
   end
 
-  def load_index dir=File.join(@dir, "ferret")
-    if File.exists? dir
-      Redwood::log "loading index..."
-      @index_mutex.synchronize do
-        @index = Ferret::Index::Index.new(:path => dir, :analyzer => @analyzer, :id_field => 'message_id')
-        Redwood::log "loaded index of #{@index.size} messages"
-      end
-    else
-      Redwood::log "creating index..."
-      @index_mutex.synchronize do
-        field_infos = Ferret::Index::FieldInfos.new :store => :yes
-        field_infos.add_field :message_id, :index => :untokenized
-        field_infos.add_field :source_id
-        field_infos.add_field :source_info
-        field_infos.add_field :date, :index => :untokenized
-        field_infos.add_field :body
-        field_infos.add_field :label
-        field_infos.add_field :attachments
-        field_infos.add_field :subject
-        field_infos.add_field :from
-        field_infos.add_field :to
-        field_infos.add_field :refs
-        field_infos.add_field :snippet, :index => :no, :term_vector => :no
-        field_infos.create_index dir
-        @index = Ferret::Index::Index.new(:path => dir, :analyzer => @analyzer, :id_field => 'message_id')
-      end
-    end
+  def load_index
+    unimplemented
   end
 
   ## Syncs the message to the index, replacing any previous version.  adding
   ## either way. Index state will be determined by the message's #labels
   ## accessor.
   def sync_message m, opts={}
-    entry = @index[m.id]
-
-    raise "no source info for message #{m.id}" unless m.source && m.source_info
-
-    source_id = if m.source.is_a? Integer
-      m.source
-    else
-      m.source.id or raise "unregistered source #{m.source} (id #{m.source.id.inspect})"
-    end
-
-    snippet = if m.snippet_contains_encrypted_content? && $config[:discard_snippets_from_encrypted_messages]
-      ""
-    else
-      m.snippet
-    end
-
-    ## write the new document to the index. if the entry already exists in the
-    ## index, reuse it (which avoids having to reload the entry from the source,
-    ## which can be quite expensive for e.g. large threads of IMAP actions.)
-    ##
-    ## exception: if the index entry belongs to an earlier version of the
-    ## message, use everything from the new message instead, but union the
-    ## flags. this allows messages sent to mailing lists to have their header
-    ## updated and to have flags set properly.
-    ##
-    ## minor hack: messages in sources with lower ids have priority over
-    ## messages in sources with higher ids. so messages in the inbox will
-    ## override everyone, and messages in the sent box will be overridden
-    ## by everyone else.
-    ##
-    ## written in this manner to support previous versions of the index which
-    ## did not keep around the entry body. upgrading is thus seamless.
-    entry ||= {}
-    labels = m.labels.uniq # override because this is the new state, unless...
-
-    ## if we are a later version of a message, ignore what's in the index,
-    ## but merge in the labels.
-    if entry[:source_id] && entry[:source_info] && entry[:label] &&
-      ((entry[:source_id].to_i > source_id) || (entry[:source_info].to_i < m.source_info))
-      labels = (entry[:label].symbolistize + m.labels).uniq
-      #Redwood::log "found updated version of message #{m.id}: #{m.subj}"
-      #Redwood::log "previous version was at #{entry[:source_id].inspect}:#{entry[:source_info].inspect}, this version at #{source_id.inspect}:#{m.source_info.inspect}"
-      #Redwood::log "merged labels are #{labels.inspect} (index #{entry[:label].inspect}, message #{m.labels.inspect})"
-      entry = {}
-    end
-
-    ## if force_overwite is true, ignore what's in the index. this is used
-    ## primarily by sup-sync to force index updates.
-    entry = {} if opts[:force_overwrite]
-
-    d = {
-      :message_id => m.id,
-      :source_id => source_id,
-      :source_info => m.source_info,
-      :date => (entry[:date] || m.date.to_indexable_s),
-      :body => (entry[:body] || m.indexable_content),
-      :snippet => snippet, # always override
-      :label => labels.uniq.join(" "),
-      :attachments => (entry[:attachments] || m.attachments.uniq.join(" ")),
-
-      ## always override :from and :to.
-      ## older versions of Sup would often store the wrong thing in the index
-      ## (because they were canonicalizing email addresses, resulting in the
-      ## wrong name associated with each.) the correct address is read from
-      ## the original header when these messages are opened in thread-view-mode,
-      ## so this allows people to forcibly update the address in the index by
-      ## marking those threads for saving.
-      :from => (m.from ? m.from.indexable_content : ""),
-      :to => (m.to + m.cc + m.bcc).map { |x| x.indexable_content }.join(" "),
-
-      :subject => (entry[:subject] || wrap_subj(Message.normalize_subj(m.subj))),
-      :refs => (entry[:refs] || (m.refs + m.replytos).uniq.join(" ")),
-    }
-
-    @index_mutex.synchronize do
-      @index.delete m.id
-      @index.add_document d
-    end
+    unimplemented
   end
 
-  def save_index fn=File.join(@dir, "ferret")
-    # don't have to do anything, apparently
+  def save_index fn
+    unimplemented
   end
 
   def contains_id? id
-    @index_mutex.synchronize { @index.search(Ferret::Search::TermQuery.new(:message_id, id)).total_hits > 0 }
+    unimplemented
   end
+
   def contains? m; contains_id? m.id end
-  def size; @index_mutex.synchronize { @index.size } end
+
+  def size
+    unimplemented
+  end
+
   def empty?; size == 0 end
 
-  ## you should probably not call this on a block that doesn't break
+  ## Yields a message-id and message-building lambda for each
+  ## message that matches the given query, in descending date order.
+  ## You should probably not call this on a block that doesn't break
   ## rather quickly because the results can be very large.
-  EACH_BY_DATE_NUM = 100
   def each_id_by_date query={}
-    return if empty? # otherwise ferret barfs ###TODO: remove this once my ferret patch is accepted
-    ferret_query = build_ferret_query query
-    offset = 0
-    while true
-      limit = (query[:limit])? [EACH_BY_DATE_NUM, query[:limit] - offset].min : EACH_BY_DATE_NUM
-      results = @index_mutex.synchronize { @index.search ferret_query, :sort => "date DESC", :limit => limit, :offset => offset }
-      Redwood::log "got #{results.total_hits} results for query (offset #{offset}) #{ferret_query.inspect}"
-      results.hits.each do |hit|
-        yield @index_mutex.synchronize { @index[hit.doc][:message_id] }, lambda { build_message hit.doc }
-      end
-      break if query[:limit] and offset >= query[:limit] - limit
-      break if offset >= results.total_hits - limit
-      offset += limit
-    end
+    unimplemented
   end
 
+  ## Return the number of matches for query in the index
   def num_results_for query={}
-    return 0 if empty? # otherwise ferret barfs ###TODO: remove this once my ferret patch is accepted
-
-    ferret_query = build_ferret_query query
-    @index_mutex.synchronize { @index.search(ferret_query, :limit => 1).total_hits }
+    unimplemented
   end
 
   ## yield all messages in the thread containing 'm' by repeatedly
@@ -278,300 +157,60 @@ EOS
   ## only two options, :limit and :skip_killed. if :skip_killed is
   ## true, stops loading any thread if a message with a :killed flag
   ## is found.
-  SAME_SUBJECT_DATE_LIMIT = 7
-  MAX_CLAUSES = 1000
   def each_message_in_thread_for m, opts={}
-    #Redwood::log "Building thread for #{m.id}: #{m.subj}"
-    messages = {}
-    searched = {}
-    num_queries = 0
-
-    pending = [m.id]
-    if $config[:thread_by_subject] # do subject queries
-      date_min = m.date - (SAME_SUBJECT_DATE_LIMIT * 12 * 3600)
-      date_max = m.date + (SAME_SUBJECT_DATE_LIMIT * 12 * 3600)
-
-      q = Ferret::Search::BooleanQuery.new true
-      sq = Ferret::Search::PhraseQuery.new(:subject)
-      wrap_subj(Message.normalize_subj(m.subj)).split.each do |t|
-        sq.add_term t
-      end
-      q.add_query sq, :must
-      q.add_query Ferret::Search::RangeQuery.new(:date, :>= => date_min.to_indexable_s, :<= => date_max.to_indexable_s), :must
-
-      q = build_ferret_query :qobj => q
-
-      p1 = @index_mutex.synchronize { @index.search(q).hits.map { |hit| @index[hit.doc][:message_id] } }
-      Redwood::log "found #{p1.size} results for subject query #{q}"
-
-      p2 = @index_mutex.synchronize { @index.search(q.to_s, :limit => :all).hits.map { |hit| @index[hit.doc][:message_id] } }
-      Redwood::log "found #{p2.size} results in string form"
-
-      pending = (pending + p1 + p2).uniq
-    end
-
-    until pending.empty? || (opts[:limit] && messages.size >= opts[:limit])
-      q = Ferret::Search::BooleanQuery.new true
-      # this disappeared in newer ferrets... wtf.
-      # q.max_clause_count = 2048
-
-      lim = [MAX_CLAUSES / 2, pending.length].min
-      pending[0 ... lim].each do |id|
-        searched[id] = true
-        q.add_query Ferret::Search::TermQuery.new(:message_id, id), :should
-        q.add_query Ferret::Search::TermQuery.new(:refs, id), :should
-      end
-      pending = pending[lim .. -1]
-
-      q = build_ferret_query :qobj => q
-
-      num_queries += 1
-      killed = false
-      @index_mutex.synchronize do
-        @index.search_each(q, :limit => :all) do |docid, score|
-          break if opts[:limit] && messages.size >= opts[:limit]
-          if @index[docid][:label].split(/\s+/).include?("killed") && opts[:skip_killed]
-            killed = true
-            break
-          end
-          mid = @index[docid][:message_id]
-          unless messages.member?(mid)
-            #Redwood::log "got #{mid} as a child of #{id}"
-            messages[mid] ||= lambda { build_message docid }
-            refs = @index[docid][:refs].split
-            pending += refs.select { |id| !searched[id] }
-          end
-        end
-      end
-    end
-
-    if killed
-      Redwood::log "thread for #{m.id} is killed, ignoring"
-      false
-    else
-      Redwood::log "ran #{num_queries} queries to build thread of #{messages.size} messages for #{m.id}: #{m.subj}" if num_queries > 0
-      messages.each { |mid, builder| yield mid, builder }
-      true
-    end
+    unimplemented
   end
 
-  ## builds a message object from a ferret result
-  def build_message docid
-    @index_mutex.synchronize do
-      doc = @index[docid] or return
-
-      source = SourceManager[doc[:source_id].to_i]
-      raise "invalid source #{doc[:source_id]}" unless source
-
-      #puts "building message #{doc[:message_id]} (#{source}##{doc[:source_info]})"
-
-      fake_header = {
-        "date" => Time.at(doc[:date].to_i),
-        "subject" => unwrap_subj(doc[:subject]),
-        "from" => doc[:from],
-        "to" => doc[:to].split.join(", "), # reformat
-        "message-id" => doc[:message_id],
-        "references" => doc[:refs].split.map { |x| "<#{x}>" }.join(" "),
-      }
-
-      m = Message.new :source => source, :source_info => doc[:source_info].to_i,
-                  :labels => doc[:label].symbolistize,
-                  :snippet => doc[:snippet]
-      m.parse_header fake_header
-      m
-    end
+  ## Load message with the given message-id from the index
+  def build_message id
+    unimplemented
   end
 
-  def delete id; @index_mutex.synchronize { @index.delete id } end
-
-  def load_contacts emails, h={}
-    q = Ferret::Search::BooleanQuery.new true
-    emails.each do |e|
-      qq = Ferret::Search::BooleanQuery.new true
-      qq.add_query Ferret::Search::TermQuery.new(:from, e), :should
-      qq.add_query Ferret::Search::TermQuery.new(:to, e), :should
-      q.add_query qq
-    end
-    q.add_query Ferret::Search::TermQuery.new(:label, "spam"), :must_not
-    
-    Redwood::log "contact search: #{q}"
-    contacts = {}
-    num = h[:num] || 20
-    @index_mutex.synchronize do
-      @index.search_each q, :sort => "date DESC", :limit => :all do |docid, score|
-        break if contacts.size >= num
-        #Redwood::log "got message #{docid} to: #{@index[docid][:to].inspect} and from: #{@index[docid][:from].inspect}"
-        f = @index[docid][:from]
-        t = @index[docid][:to]
-
-        if AccountManager.is_account_email? f
-          t.split(" ").each { |e| contacts[Person.from_address(e)] = true }
-        else
-          contacts[Person.from_address(f)] = true
-        end
-      end
-    end
+  ## Delete message with the given message-id from the index
+  def delete id
+    unimplemented
+  end
 
-    contacts.keys.compact
+  ## Given an array of email addresses, return an array of Person objects that
+  ## have sent mail to or received mail from any of the given addresses.
+  def load_contacts email_addresses, h={}
+    unimplemented
   end
 
+  ## Yield each docid matching query
   def each_docid query={}
-    ferret_query = build_ferret_query query
-    results = @index_mutex.synchronize { @index.search ferret_query, :limit => (query[:limit] || :all) }
-    results.hits.map { |hit| yield hit.doc }
+    unimplemented
   end
     
+  ## Yield each messages matching query
   def each_message query={}
-    each_docid query do |docid|
-      yield build_message(docid)
-    end
+    unimplemented
   end
 
+  ## Implementation-specific optimization step
   def optimize
-    @index_mutex.synchronize { @index.optimize }
+    unimplemented
   end
 
+  ## Return the id source of the source the message with the given message-id
+  ## was synced from
   def source_for_id id
-    entry = @index[id]
-    return unless entry
-    entry[:source_id].to_i
+    unimplemented
   end
 
   class ParseError < StandardError; end
 
   ## parse a query string from the user. returns a query object
   ## that can be passed to any index method with a 'query' 
-  ## argument, as well as build_ferret_query.
+  ## argument.
   ##
   ## raises a ParseError if something went wrong.
   def parse_query s
-    query = {}
-
-    subs = s.gsub(/\b(to|from):(\S+)\b/) do
-      field, name = $1, $2
-      if(p = ContactManager.contact_for(name))
-        [field, p.email]
-      elsif name == "me"
-        [field, "(" + AccountManager.user_emails.join("||") + ")"]
-      else
-        [field, name]
-      end.join(":")
-    end
-
-    ## if we see a label:deleted or a label:spam term anywhere in the query
-    ## string, we set the extra load_spam or load_deleted options to true.
-    ## bizarre? well, because the query allows arbitrary parenthesized boolean
-    ## expressions, without fully parsing the query, we can't tell whether
-    ## the user is explicitly directing us to search spam messages or not.
-    ## e.g. if the string is -(-(-(-(-label:spam)))), does the user want to
-    ## search spam messages or not?
-    ##
-    ## so, we rely on the fact that turning these extra options ON turns OFF
-    ## the adding of "-label:deleted" or "-label:spam" terms at the very
-    ## final stage of query processing. if the user wants to search spam
-    ## messages, not adding that is the right thing; if he doesn't want to
-    ## search spam messages, then not adding it won't have any effect.
-    query[:load_spam] = true if subs =~ /\blabel:spam\b/
-    query[:load_deleted] = true if subs =~ /\blabel:deleted\b/
-
-    ## gmail style "is" operator
-    subs = subs.gsub(/\b(is|has):(\S+)\b/) do
-      field, label = $1, $2
-      case label
-      when "read"
-        "-label:unread"
-      when "spam"
-        query[:load_spam] = true
-        "label:spam"
-      when "deleted"
-        query[:load_deleted] = true
-        "label:deleted"
-      else
-        "label:#{$2}"
-      end
-    end
-
-    ## gmail style attachments "filename" and "filetype" searches
-    subs = subs.gsub(/\b(filename|filetype):(\((.+?)\)\B|(\S+)\b)/) do
-      field, name = $1, ($3 || $4)
-      case field
-      when "filename"
-        Redwood::log "filename - translated #{field}:#{name} to attachments:(#{name.downcase})"
-        "attachments:(#{name.downcase})"
-      when "filetype"
-        Redwood::log "filetype - translated #{field}:#{name} to attachments:(*.#{name.downcase})"
-        "attachments:(*.#{name.downcase})"
-      end
-    end
-
-    if $have_chronic
-      subs = subs.gsub(/\b(before|on|in|during|after):(\((.+?)\)\B|(\S+)\b)/) do
-        field, datestr = $1, ($3 || $4)
-        realdate = Chronic.parse datestr, :guess => false, :context => :past
-        if realdate
-          case field
-          when "after"
-            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate.end}"
-            "date:(>= #{sprintf "%012d", realdate.end.to_i})"
-          when "before"
-            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate.begin}"
-            "date:(<= #{sprintf "%012d", realdate.begin.to_i})"
-          else
-            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate}"
-            "date:(<= #{sprintf "%012d", realdate.end.to_i}) date:(>= #{sprintf "%012d", realdate.begin.to_i})"
-          end
-        else
-          raise ParseError, "can't understand date #{datestr.inspect}"
-        end
-      end
-    end
-
-    ## limit:42 restrict the search to 42 results
-    subs = subs.gsub(/\blimit:(\S+)\b/) do
-      lim = $1
-      if lim =~ /^\d+$/
-        query[:limit] = lim.to_i
-        ''
-      else
-        raise ParseError, "non-numeric limit #{lim.inspect}"
-      end
-    end
-    
-    begin
-      query[:qobj] = @qparser.parse(subs)
-      query[:text] = s
-      query
-    rescue Ferret::QueryParser::QueryParseException => e
-      raise ParseError, e.message
-    end
-  end
-
-private
-
-  def build_ferret_query query
-    q = Ferret::Search::BooleanQuery.new
-    q.add_query query[:qobj], :must if query[:qobj]
-    labels = ([query[:label]] + (query[:labels] || [])).compact
-    labels.each { |t| q.add_query Ferret::Search::TermQuery.new("label", t.to_s), :must }
-    if query[:participants]
-      q2 = Ferret::Search::BooleanQuery.new
-      query[:participants].each do |p|
-        q2.add_query Ferret::Search::TermQuery.new("from", p.email), :should
-        q2.add_query Ferret::Search::TermQuery.new("to", p.email), :should
-      end
-      q.add_query q2, :must
-    end
-        
-    q.add_query Ferret::Search::TermQuery.new("label", "spam"), :must_not unless query[:load_spam] || labels.include?(:spam)
-    q.add_query Ferret::Search::TermQuery.new("label", "deleted"), :must_not unless query[:load_deleted] || labels.include?(:deleted)
-    q.add_query Ferret::Search::TermQuery.new("label", "killed"), :must_not if query[:skip_killed]
-
-    q.add_query Ferret::Search::TermQuery.new("source_id", query[:source_id]), :must if query[:source_id]
-    q
+    unimplemented
   end
-
-  def wrap_subj subj; "__START_SUBJECT__ #{subj} __END_SUBJECT__"; end
-  def unwrap_subj subj; subj =~ /__START_SUBJECT__ (.*?) __END_SUBJECT__/ && $1; end
 end
 
 end
+
+require 'lib/sup/ferret_index'
+Redwood::Index = Redwood::FerretIndex
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 12/18] remove last external uses of ferret docid
  2009-06-20 20:50                     ` [sup-talk] [PATCH 11/18] index: move Ferret-specific code to ferret_index.rb Rich Lane
@ 2009-06-20 20:50                       ` Rich Lane
  2009-06-20 20:50                         ` [sup-talk] [PATCH 13/18] add Message.indexable_{body, chunks, subject} Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 bin/sup-sync-back       |    2 +-
 bin/sup-tweak-labels    |    6 +++---
 lib/sup/ferret_index.rb |   10 ++--------
 lib/sup/index.rb        |   12 +++++++-----
 4 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/bin/sup-sync-back b/bin/sup-sync-back
index 679e03a..8aa2039 100755
--- a/bin/sup-sync-back
+++ b/bin/sup-sync-back
@@ -17,7 +17,7 @@ def die msg
 end
 def has_any_from_source_with_label? index, source, label
   query = { :source_id => source.id, :label => label, :limit => 1 }
-  not Enumerable::Enumerator.new(index, :each_docid, query).map.empty?
+  not Enumerable::Enumerator.new(index, :each_id, query).map.empty?
 end
 
 opts = Trollop::options do
diff --git a/bin/sup-tweak-labels b/bin/sup-tweak-labels
index 95a3b03..a8115ea 100755
--- a/bin/sup-tweak-labels
+++ b/bin/sup-tweak-labels
@@ -83,14 +83,14 @@ begin
   query += ' ' + opts[:query] if opts[:query]
 
   parsed_query = index.parse_query query
-  docs = Enumerable::Enumerator.new(index, :each_docid, parsed_query).map
-  num_total = docs.size
+  ids = Enumerable::Enumerator.new(index, :each_id, parsed_query).map
+  num_total = ids.size
 
   $stderr.puts "Found #{num_total} documents across #{source_ids.length} sources. Scanning..."
 
   num_changed = num_scanned = 0
   last_info_time = start_time = Time.now
-  docs.each do |id|
+  ids.each do |id|
     num_scanned += 1
 
     m = index.build_message id
diff --git a/lib/sup/ferret_index.rb b/lib/sup/ferret_index.rb
index 53c19e0..a2c30ab 100644
--- a/lib/sup/ferret_index.rb
+++ b/lib/sup/ferret_index.rb
@@ -301,16 +301,10 @@ class FerretIndex < BaseIndex
     contacts.keys.compact
   end
 
-  def each_docid query={}
+  def each_id query={}
     ferret_query = build_ferret_query query
     results = @index_mutex.synchronize { @index.search ferret_query, :limit => (query[:limit] || :all) }
-    results.hits.map { |hit| yield hit.doc }
-  end
-    
-  def each_message query={}
-    each_docid query do |docid|
-      yield build_message(docid)
-    end
+    results.hits.map { |hit| yield @index[hit.doc][:message_id] }
   end
 
   def optimize
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index be0e870..45382f1 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -177,14 +177,16 @@ EOS
     unimplemented
   end
 
-  ## Yield each docid matching query
-  def each_docid query={}
+  ## Yield each message-id matching query
+  def each_id query={}
     unimplemented
   end
     
-  ## Yield each messages matching query
-  def each_message query={}
-    unimplemented
+  ## Yield each message matching query
+  def each_message query={}, &b
+    each_id query do |id|
+      yield build_message(id)
+    end
   end
 
   ## Implementation-specific optimization step
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 13/18] add Message.indexable_{body, chunks, subject}
  2009-06-20 20:50                       ` [sup-talk] [PATCH 12/18] remove last external uses of ferret docid Rich Lane
@ 2009-06-20 20:50                         ` Rich Lane
  2009-06-20 20:50                           ` [sup-talk] [PATCH 14/18] index: choose index implementation with config entry or environment variable Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/message.rb |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index b667cb3..2999986 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -270,11 +270,23 @@ EOS
       to.map { |p| p.indexable_content },
       cc.map { |p| p.indexable_content },
       bcc.map { |p| p.indexable_content },
-      chunks.select { |c| c.is_a? Chunk::Text }.map { |c| c.lines },
-      Message.normalize_subj(subj),
+      indexable_chunks.map { |c| c.lines },
+      indexable_subject,
     ].flatten.compact.join " "
   end
 
+  def indexable_body
+    indexable_chunks.map { |c| c.lines }.flatten.compact.join " "
+  end
+
+  def indexable_chunks
+    chunks.select { |c| c.is_a? Chunk::Text }
+  end
+
+  def indexable_subject
+    Message.normalize_subj(subj)
+  end
+
   def quotable_body_lines
     chunks.find_all { |c| c.quotable? }.map { |c| c.lines }.flatten
   end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 14/18] index: choose index implementation with config entry or environment variable
  2009-06-20 20:50                         ` [sup-talk] [PATCH 13/18] add Message.indexable_{body, chunks, subject} Rich Lane
@ 2009-06-20 20:50                           ` Rich Lane
  2009-06-20 20:50                             ` [sup-talk] [PATCH 15/18] index: add xapian implementation Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup.rb       |    2 ++
 lib/sup/index.rb |   10 ++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/sup.rb b/lib/sup.rb
index 5689c2b..54de73f 100644
--- a/lib/sup.rb
+++ b/lib/sup.rb
@@ -54,6 +54,8 @@ module Redwood
   YAML_DOMAIN = "masanjin.net"
   YAML_DATE = "2006-10-01"
 
+  DEFAULT_INDEX = 'ferret'
+
   ## record exceptions thrown in threads nicely
   @exceptions = []
   @exception_mutex = Mutex.new
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 45382f1..df428f7 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -212,7 +212,13 @@ EOS
   end
 end
 
+index_name = ENV['SUP_INDEX'] || $config[:index] || DEFAULT_INDEX
+begin
+  require "sup/#{index_name}_index"
+rescue LoadError
+  fail "invalid index name #{index_name.inspect}"
 end
+Index = Redwood.const_get "#{index_name.capitalize}Index"
+Redwood::log "using index #{Index.name}"
 
-require 'lib/sup/ferret_index'
-Redwood::Index = Redwood::FerretIndex
+end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 15/18] index: add xapian implementation
  2009-06-20 20:50                           ` [sup-talk] [PATCH 14/18] index: choose index implementation with config entry or environment variable Rich Lane
@ 2009-06-20 20:50                             ` Rich Lane
  2009-06-20 20:50                               ` [sup-talk] [PATCH 16/18] fix String#ord monkeypatch Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/poll.rb         |    2 +-
 lib/sup/xapian_index.rb |  483 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 484 insertions(+), 1 deletions(-)
 create mode 100644 lib/sup/xapian_index.rb

diff --git a/lib/sup/poll.rb b/lib/sup/poll.rb
index c83290c..8a9d218 100644
--- a/lib/sup/poll.rb
+++ b/lib/sup/poll.rb
@@ -147,7 +147,7 @@ EOS
         m_new = Message.build_from_source source, offset
         m_old = Index.build_message m_new.id
 
-        m_new.labels = default_labels + (source.archived? ? [] : [:inbox])
+        m_new.labels += default_labels + (source.archived? ? [] : [:inbox])
         m_new.labels << :sent if source.uri.eql?(SentManager.source_uri)
         m_new.labels.delete :unread if m_new.source_marked_read?
         m_new.labels.each { |l| LabelManager << l }
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
new file mode 100644
index 0000000..7faa64d
--- /dev/null
+++ b/lib/sup/xapian_index.rb
@@ -0,0 +1,483 @@
+require 'xapian'
+require 'gdbm'
+require 'set'
+
+module Redwood
+
+# This index implementation uses Xapian for searching and GDBM for storage. It
+# tends to be slightly faster than Ferret for indexing and significantly faster
+# for searching due to precomputing thread membership.
+class XapianIndex < BaseIndex
+  STEM_LANGUAGE = "english"
+
+  def initialize dir=BASE_DIR
+    super
+
+    @index_mutex = Monitor.new
+
+    @entries = MarshalledGDBM.new File.join(dir, "entries.db")
+    @docids = MarshalledGDBM.new File.join(dir, "docids.db")
+    @thread_members = MarshalledGDBM.new File.join(dir, "thread_members.db")
+    @thread_ids = MarshalledGDBM.new File.join(dir, "thread_ids.db")
+    @assigned_docids = GDBM.new File.join(dir, "assigned_docids.db")
+
+    @xapian = Xapian::WritableDatabase.new(File.join(dir, "xapian"), Xapian::DB_CREATE_OR_OPEN)
+    @term_generator = Xapian::TermGenerator.new()
+    @term_generator.stemmer = Xapian::Stem.new(STEM_LANGUAGE)
+    @enquire = Xapian::Enquire.new @xapian
+    @enquire.weighting_scheme = Xapian::BoolWeight.new
+    @enquire.docid_order = Xapian::Enquire::ASCENDING
+  end
+
+  def load_index
+  end
+
+  def save_index
+  end
+
+  def optimize
+  end
+
+  def size
+    synchronize { @xapian.doccount }
+  end
+
+  def contains_id? id
+    synchronize { @entries.member? id }
+  end
+
+  def source_for_id id
+    synchronize { @entries[id][:source_id] }
+  end
+
+  def delete id
+    synchronize { @xapian.delete_document @docids[id] }
+  end
+
+  def build_message id
+    entry = synchronize { @entries[id] }
+    return unless entry
+
+    source = SourceManager[entry[:source_id]]
+    raise "invalid source #{entry[:source_id]}" unless source
+
+    mk_addrs = lambda { |l| l.map { |e,n| "#{n} <#{e}>" } * ', ' }
+    mk_refs = lambda { |l| l.map { |r| "<#{r}>" } * ' ' }
+    fake_header = {
+      'message-id' => entry[:message_id],
+      'date' => Time.at(entry[:date]),
+      'subject' => entry[:subject],
+      'from' => mk_addrs[[entry[:from]]],
+      'to' => mk_addrs[[entry[:to]]],
+      'cc' => mk_addrs[[entry[:cc]]],
+      'bcc' => mk_addrs[[entry[:bcc]]],
+      'reply-tos' => mk_refs[entry[:replytos]],
+      'references' => mk_refs[entry[:refs]],
+     }
+
+      m = Message.new :source => source, :source_info => entry[:source_info],
+                  :labels => entry[:labels],
+                  :snippet => entry[:snippet]
+      m.parse_header fake_header
+      m
+  end
+
+  def sync_message m, opts={}
+    entry = synchronize { @entries[m.id] }
+    snippet = m.snippet
+    entry ||= {}
+    labels = m.labels
+    entry = {} if opts[:force_overwrite]
+
+    d = {
+      :message_id => m.id,
+      :source_id => m.source.id,
+      :source_info => m.source_info,
+      :date => (entry[:date] || m.date),
+      :snippet => snippet,
+      :labels => labels.uniq,
+      :from => (entry[:from] || [m.from.email, m.from.name]),
+      :to => (entry[:to] || m.to.map { |p| [p.email, p.name] }),
+      :cc => (entry[:cc] || m.cc.map { |p| [p.email, p.name] }),
+      :bcc => (entry[:bcc] || m.bcc.map { |p| [p.email, p.name] }),
+      :subject => m.subj,
+      :refs => (entry[:refs] || m.refs),
+      :replytos => (entry[:replytos] || m.replytos),
+    }
+
+    m.labels.each { |l| LabelManager << l }
+
+    synchronize do
+      index_message m, opts
+      union_threads([m.id] + m.refs + m.replytos)
+      @entries[m.id] = d
+    end
+    true
+  end
+
+  def num_results_for query={}
+    xapian_query = build_xapian_query query
+    matchset = run_query xapian_query, 0, 0, 100
+    matchset.matches_estimated
+  end
+
+  EACH_ID_PAGE = 100
+  def each_id query={}
+    offset = 0
+    page = EACH_ID_PAGE
+
+    xapian_query = build_xapian_query query
+    while true
+      ids = run_query_ids xapian_query, offset, (offset+page)
+      ids.each { |id| yield id }
+      break if ids.size < page
+      offset += page
+    end
+  end
+
+  def each_id_by_date query={}
+    each_id(query) { |id| yield id, lambda { build_message id } }
+  end
+
+  def each_message_in_thread_for m, opts={}
+    # TODO thread by subject
+    # TODO handle killed threads
+    ids = synchronize { @thread_members[@thread_ids[m.id]] } || []
+    ids.select { |id| contains_id? id }.each { |id| yield id, lambda { build_message id } }
+    true
+  end
+
+  def load_contacts emails, opts={}
+    contacts = Set.new
+    num = opts[:num] || 20
+    each_id_by_date :participants => emails do |id,b|
+      break if contacts.size >= num
+      m = b.call
+      ([m.from]+m.to+m.cc+m.bcc).compact.each { |p| contacts << [p.name, p.email] }
+    end
+    contacts.to_a.compact.map { |n,e| Person.new n, e }[0...num]
+  end
+
+  # TODO share code with the Ferret index
+  def parse_query s
+    query = {}
+
+    subs = s.gsub(/\b(to|from):(\S+)\b/) do
+      field, name = $1, $2
+      if(p = ContactManager.contact_for(name))
+        [field, p.email]
+      elsif name == "me"
+        [field, "(" + AccountManager.user_emails.join("||") + ")"]
+      else
+        [field, name]
+      end.join(":")
+    end
+
+    ## if we see a label:deleted or a label:spam term anywhere in the query
+    ## string, we set the extra load_spam or load_deleted options to true.
+    ## bizarre? well, because the query allows arbitrary parenthesized boolean
+    ## expressions, without fully parsing the query, we can't tell whether
+    ## the user is explicitly directing us to search spam messages or not.
+    ## e.g. if the string is -(-(-(-(-label:spam)))), does the user want to
+    ## search spam messages or not?
+    ##
+    ## so, we rely on the fact that turning these extra options ON turns OFF
+    ## the adding of "-label:deleted" or "-label:spam" terms at the very
+    ## final stage of query processing. if the user wants to search spam
+    ## messages, not adding that is the right thing; if he doesn't want to
+    ## search spam messages, then not adding it won't have any effect.
+    query[:load_spam] = true if subs =~ /\blabel:spam\b/
+    query[:load_deleted] = true if subs =~ /\blabel:deleted\b/
+
+    ## gmail style "is" operator
+    subs = subs.gsub(/\b(is|has):(\S+)\b/) do
+      field, label = $1, $2
+      case label
+      when "read"
+        "-label:unread"
+      when "spam"
+        query[:load_spam] = true
+        "label:spam"
+      when "deleted"
+        query[:load_deleted] = true
+        "label:deleted"
+      else
+        "label:#{$2}"
+      end
+    end
+
+    ## gmail style attachments "filename" and "filetype" searches
+    subs = subs.gsub(/\b(filename|filetype):(\((.+?)\)\B|(\S+)\b)/) do
+      field, name = $1, ($3 || $4)
+      case field
+      when "filename"
+        Redwood::log "filename - translated #{field}:#{name} to attachment:\"#{name.downcase}\""
+        "attachment:\"#{name.downcase}\""
+      when "filetype"
+        Redwood::log "filetype - translated #{field}:#{name} to attachment_extension:#{name.downcase}"
+        "attachment_extension:#{name.downcase}"
+      end
+    end
+
+    if $have_chronic
+      lastdate = 2<<32 - 1
+      firstdate = 0
+      subs = subs.gsub(/\b(before|on|in|during|after):(\((.+?)\)\B|(\S+)\b)/) do
+        field, datestr = $1, ($3 || $4)
+        realdate = Chronic.parse datestr, :guess => false, :context => :past
+        if realdate
+          case field
+          when "after"
+            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate.end}"
+            "date:#{realdate.end.to_i}..#{lastdate}"
+          when "before"
+            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate.begin}"
+            "date:#{firstdate}..#{realdate.end.to_i}"
+          else
+            Redwood::log "chronic: translated #{field}:#{datestr} to #{realdate}"
+            "date:#{realdate.begin.to_i}..#{realdate.end.to_i}"
+          end
+        else
+          raise ParseError, "can't understand date #{datestr.inspect}"
+        end
+      end
+    end
+
+    ## limit:42 restrict the search to 42 results
+    subs = subs.gsub(/\blimit:(\S+)\b/) do
+      lim = $1
+      if lim =~ /^\d+$/
+        query[:limit] = lim.to_i
+        ''
+      else
+        raise ParseError, "non-numeric limit #{lim.inspect}"
+      end
+    end
+
+    qp = Xapian::QueryParser.new
+    qp.database = @xapian
+    qp.stemmer = Xapian::Stem.new(STEM_LANGUAGE)
+    qp.stemming_strategy = Xapian::QueryParser::STEM_SOME
+    qp.default_op = Xapian::Query::OP_AND
+    qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(DATE_VALUENO, 'date:', true))
+    NORMAL_PREFIX.each { |k,v| qp.add_prefix k, v }
+    BOOLEAN_PREFIX.each { |k,v| qp.add_boolean_prefix k, v }
+    xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD, PREFIX['body'])
+
+    raise ParseError if xapian_query.nil? or xapian_query.empty?
+    query[:qobj] = xapian_query
+    query[:text] = s
+    query
+  end
+
+  private
+
+  # Stemmed
+  NORMAL_PREFIX = {
+    'subject' => 'S',
+    'body' => 'B',
+    'from_name' => 'FN',
+    'to_name' => 'TN',
+    'name' => 'N',
+    'attachment' => 'A',
+  }
+
+  # Unstemmed
+  BOOLEAN_PREFIX = {
+    'type' => 'K',
+    'from_email' => 'FE',
+    'to_email' => 'TE',
+    'email' => 'E',
+    'date' => 'D',
+    'label' => 'L',
+    'source_id' => 'I',
+    'attachment_extension' => 'O',
+  }
+
+  PREFIX = NORMAL_PREFIX.merge BOOLEAN_PREFIX
+
+  DATE_VALUENO = 0
+
+  # Xapian can very efficiently sort in ascending docid order. Sup always wants
+  # to sort by descending date, so this method maps between them. In order to
+  # handle multiple messages per second, we use a logistic curve centered
+  # around MIDDLE_DATE so that the slope (docid/s) is greatest in this time
+  # period. A docid collision is not an error - the code will pick the next
+  # smallest unused one.
+  DOCID_SCALE = 2.0**32
+  TIME_SCALE = 2.0**27
+  MIDDLE_DATE = Time.gm(2011)
+  def assign_docid m
+    t = (m.date.to_i - MIDDLE_DATE.to_i).to_f
+    docid = (DOCID_SCALE - DOCID_SCALE/(Math::E**(-(t/TIME_SCALE)) + 1)).to_i
+    begin
+      while @assigned_docids.member? [docid].pack("N")
+        docid -= 1
+      end
+    rescue
+    end
+    @assigned_docids[[docid].pack("N")] = ''
+    docid
+  end
+
+  def synchronize &b
+    @index_mutex.synchronize &b
+  end
+
+  def run_query xapian_query, offset, limit, checkatleast=0
+    synchronize do
+      @enquire.query = xapian_query
+      @enquire.mset(offset, limit-offset, checkatleast)
+    end
+  end
+
+  def run_query_ids xapian_query, offset, limit
+    matchset = run_query xapian_query, offset, limit
+    matchset.matches.map { |r| r.document.data }
+  end
+
+  Q = Xapian::Query
+  def build_xapian_query opts
+    labels = ([opts[:label]] + (opts[:labels] || [])).compact
+    neglabels = [:spam, :deleted, :killed].reject { |l| (labels.include? l) || opts.member?("load_#{l}".intern) }
+    pos_terms, neg_terms = [], []
+
+    pos_terms << mkterm(:type, 'mail')
+    pos_terms.concat(labels.map { |l| mkterm(:label,l) })
+    pos_terms << opts[:qobj] if opts[:qobj]
+    pos_terms << mkterm(:source_id, opts[:source_id]) if opts[:source_id]
+
+    if opts[:participants]
+      participant_terms = opts[:participants].map { |p| mkterm(:email,:any, (Redwood::Person === p) ? p.email : p) }
+      pos_terms << Q.new(Q::OP_OR, participant_terms)
+    end
+
+    neg_terms.concat(neglabels.map { |l| mkterm(:label,l) })
+
+    pos_query = Q.new(Q::OP_AND, pos_terms)
+    neg_query = Q.new(Q::OP_OR, neg_terms)
+
+    if neg_query.empty?
+      pos_query
+    else
+      Q.new(Q::OP_AND_NOT, [pos_query, neg_query])
+    end
+  end
+
+  def index_message m, opts
+    terms = []
+    text = []
+
+    subject_text = m.indexable_subject
+    body_text = m.indexable_body
+
+    # Person names are indexed with several prefixes
+    person_termer = lambda do |d|
+      lambda do |p|
+        ["#{d}_name", "name", "body"].each do |x|
+          text << [p.name, PREFIX[x]]
+        end if p.name
+        [d, :any].each { |x| terms << mkterm(:email, x, p.email) }
+      end
+    end
+
+    person_termer[:from][m.from] if m.from
+    (m.to+m.cc+m.bcc).each(&(person_termer[:to]))
+
+    terms << mkterm(:date,m.date) if m.date
+    m.labels.each { |t| terms << mkterm(:label,t) }
+    terms << mkterm(:type, 'mail')
+    terms << mkterm(:source_id, m.source.id)
+    m.attachments.each do |a|
+      a =~ /\.(\w+)$/ or next
+      t = mkterm(:attachment_extension, $1)
+      terms << t
+    end
+
+    # Full text search content
+    text << [subject_text, PREFIX['subject']]
+    text << [subject_text, PREFIX['body']]
+    text << [body_text, PREFIX['body']]
+    m.attachments.each { |a| text << [a, PREFIX['attachment']] }
+
+    # Date value for range queries
+    date_value = Xapian.sortable_serialise(m.date.to_i)
+
+    doc = Xapian::Document.new
+    docid = @docids[m.id] || assign_docid(m)
+
+    @term_generator.document = doc
+    text.each { |text,prefix| @term_generator.index_text text, 1, prefix }
+    terms.each { |term| doc.add_term term }
+    doc.add_value DATE_VALUENO, date_value
+    doc.data = m.id
+
+    @xapian.replace_document docid, doc
+    @docids[m.id] = docid
+  end
+
+  # Construct a Xapian term
+  def mkterm type, *args
+    case type
+    when :label
+      PREFIX['label'] + args[0].to_s.downcase
+    when :type
+      PREFIX['type'] + args[0].to_s.downcase
+    when :date
+      PREFIX['date'] + args[0].getutc.strftime("%Y%m%d%H%M%S")
+    when :email
+      case args[0]
+      when :from then PREFIX['from_email']
+      when :to then PREFIX['to_email']
+      when :any then PREFIX['email']
+      else raise "Invalid email term type #{args[0]}"
+      end + args[1].to_s.downcase
+    when :source_id
+      PREFIX['source_id'] + args[0].to_s.downcase
+    when :attachment_extension
+      PREFIX['attachment_extension'] + args[0].to_s.downcase
+    else
+      raise "Invalid term type #{type}"
+    end
+  end
+
+  # Join all the given message-ids into a single thread
+  def union_threads ids
+    seen_threads = Set.new
+    related = Set.new
+
+    # Get all the ids that will be in the new thread
+    ids.each do |id|
+      related << id
+      thread_id = @thread_ids[id]
+      if thread_id && !seen_threads.member?(thread_id)
+        thread_members = @thread_members[thread_id]
+        related.merge thread_members
+        seen_threads << thread_id
+      end
+    end
+    
+    # Pick a leader and move all the others to its thread
+    a = related.to_a
+    best, *rest = a.sort_by { |x| x.hash }
+    @thread_members[best] = a
+    @thread_ids[best] = best
+    rest.each do |x|
+      @thread_members.delete x
+      @thread_ids[x] = best
+    end
+  end
+end
+
+end
+
+class MarshalledGDBM < GDBM
+  def []= k, v
+    super k, Marshal.dump(v)
+  end
+
+  def [] k
+    v = super k
+    v ? Marshal.load(v) : nil
+  end
+end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 16/18] fix String#ord monkeypatch
  2009-06-20 20:50                             ` [sup-talk] [PATCH 15/18] index: add xapian implementation Rich Lane
@ 2009-06-20 20:50                               ` Rich Lane
  2009-06-20 20:50                                 ` [sup-talk] [PATCH 17/18] add limit argument to author_names_and_newness_for_thread Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/util.rb |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/sup/util.rb b/lib/sup/util.rb
index 8f60cc4..0609908 100644
--- a/lib/sup/util.rb
+++ b/lib/sup/util.rb
@@ -282,7 +282,7 @@ class String
     gsub(/\t/, "    ").gsub(/\r/, "")
   end
 
-  if not defined? ord
+  unless method_defined? :ord
     def ord
       self[0]
     end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 17/18] add limit argument to author_names_and_newness_for_thread
  2009-06-20 20:50                               ` [sup-talk] [PATCH 16/18] fix String#ord monkeypatch Rich Lane
@ 2009-06-20 20:50                                 ` Rich Lane
  2009-06-20 20:50                                   ` [sup-talk] [PATCH 18/18] dont using SavingHash#[] for membership test Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/modes/thread-index-mode.rb |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/sup/modes/thread-index-mode.rb b/lib/sup/modes/thread-index-mode.rb
index 0bd8110..b671119 100644
--- a/lib/sup/modes/thread-index-mode.rb
+++ b/lib/sup/modes/thread-index-mode.rb
@@ -1,3 +1,5 @@
+require 'set'
+
 module Redwood
 
 ## subclasses should implement:
@@ -757,10 +759,12 @@ protected
   
   def authors; map { |m, *o| m.from if m }.compact.uniq; end
 
-  def author_names_and_newness_for_thread t
+  def author_names_and_newness_for_thread t, limit=nil
     new = {}
-    authors = t.map do |m, *o|
+    authors = Set.new
+    t.each do |m, *o|
       next unless m
+      break if limit and authors.size >= limit
 
       name = 
         if AccountManager.is_account?(m.from)
@@ -772,12 +776,13 @@ protected
         end
 
       new[name] ||= m.has_label?(:unread)
-      name
+      authors << name
     end
 
-    authors.compact.uniq.map { |a| [a, new[a]] }
+    authors.to_a.map { |a| [a, new[a]] }
   end
 
+  AUTHOR_LIMIT = 5
   def text_for_thread_at line
     t, size_widget = @mutex.synchronize { [@threads[line], @size_widgets[line]] }
 
@@ -787,7 +792,7 @@ protected
 
     ## format the from column
     cur_width = 0
-    ann = author_names_and_newness_for_thread t
+    ann = author_names_and_newness_for_thread t, AUTHOR_LIMIT
     from = []
     ann.each_with_index do |(name, newness), i|
       break if cur_width >= from_width
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 18/18] dont using SavingHash#[] for membership test
  2009-06-20 20:50                                 ` [sup-talk] [PATCH 17/18] add limit argument to author_names_and_newness_for_thread Rich Lane
@ 2009-06-20 20:50                                   ` Rich Lane
  2009-06-22 14:46                                     ` Andrei Thorp
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-06-20 20:50 UTC (permalink / raw)


---
 lib/sup/thread.rb |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/sup/thread.rb b/lib/sup/thread.rb
index 99f21dc..d395c35 100644
--- a/lib/sup/thread.rb
+++ b/lib/sup/thread.rb
@@ -310,13 +310,15 @@ class ThreadSet
   private :prune_thread_of
 
   def remove_id mid
-    return unless(c = @messages[mid])
+    return unless @messages.member?(mid)
+    c = @messages[mid]
     remove_container c
     prune_thread_of c
   end
 
   def remove_thread_containing_id mid
-    c = @messages[mid] or return
+    return unless @messages.member?(mid)
+    c = @messages[mid]
     t = c.root.thread
     @threads.delete_if { |key, thread| t == thread }
   end
@@ -355,7 +357,7 @@ class ThreadSet
     return if threads.size < 2
 
     containers = threads.map do |t|
-      c = @messages[t.first.id]
+      c = @messages.member?(c) ? @messages[t.first.id] : nil
       raise "not in threadset: #{t.first.id}" unless c && c.message
       c
     end
-- 
1.6.0.4



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 18/18] dont using SavingHash#[] for membership test
  2009-06-20 20:50                                   ` [sup-talk] [PATCH 18/18] dont using SavingHash#[] for membership test Rich Lane
@ 2009-06-22 14:46                                     ` Andrei Thorp
  0 siblings, 0 replies; 44+ messages in thread
From: Andrei Thorp @ 2009-06-22 14:46 UTC (permalink / raw)


Wow, that's one heck of a set of patches... good work dude :)

-AT
-- 
Andrei Thorp, Developer: Xandros Corp. (http://www.xandros.com)

Make it idiot-proof, and someone will breed a better idiot.
	-- Oliver Elphick


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-06-20 20:49 [sup-talk] [PATCH 0/18] Xapian-based index Rich Lane
  2009-06-20 20:50 ` [sup-talk] [PATCH 01/18] remove load_entry_for_id call in sup-recover-sources Rich Lane
@ 2009-06-24 16:30 ` William Morgan
  2009-06-24 17:33   ` William Morgan
  1 sibling, 1 reply; 44+ messages in thread
From: William Morgan @ 2009-06-24 16:30 UTC (permalink / raw)


Hi Rich,

Reformatted excerpts from Rich Lane's message of 2009-06-20:
> This patch series refactors the Index class to remove Ferret-isms and
> support multiple index implementations. The included XapianIndex is a
> bit faster at indexing messages and significantly faster when
> searching because it precomputes thread membership. It also works on
> Ruby 1.9.1.

This is great. Really, really great. You've refactored a crufty
interface that's been growing untamed over the past three years, you've
gotten us away from the unmaintained scariness that is Ferret, you've
fixed the largest source of interface slowness (thread recomputation),
and you've enabled us to move to the beautiful, speedy, encoding-aware
world of Ruby 1.9. Thank you for satisfying all of my Sup-related
desires in one fell swoop. From my lofty throne, I commend thee.

Once the bugs are ironed out, I would like to make this the default
index format and eventually deprecate Ferret.

In the mean time, I've placed your patches on a branch called xapian. If
anyone wants to play with this, here's what you do:

1. install the ruby xapian library and the ruby gdbm library, if you
   don't have them. These are packaged by your distro, and are not gems.
2. git fetch
3. git checkout -b xapian origin/xapian
4. cp ~/.sup/sources.yaml /tmp # just in case
5. sup-dump > dumpfile
6. SUP_INDEX=xapian sup-sync --all --all-sources --restore dumpfile
7. SUP_INDEX=xapian bin/sup -o
8. Oooh, fast.

This should not disturb your Ferret index, so you can switch back and
forth between the two. (Message state, of course, is not shared.)
However, adding new messages to one index will prevent it from being
automatically added to the other, so I recommend running in Xapian mode
with -o and not pressing 'P'.

> It's missing a couple of features, notably threading by subject.

FWIW, I've been thinking about deprecating that particular feature for
quite some time.

> I'm sure there are many more bugs left, so I'd appreciate any testing
> or review you all can provide.

sup-sync crashes for me fairly systematically with this error:

./lib/sup/xapian_index.rb:404:in `sortable_serialise': Expected argument 0 of type double, but got Fixnum 51767811298 (TypeError)
  in SWIG method 'Xapian::sortable_serialise'
  from ./lib/sup/xapian_index.rb:404:in `index_message'
  from ./lib/sup/xapian_index.rb:111:in `sync_message'
  from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'
  from ./lib/sup/xapian_index.rb:324:in `synchronize'
  from ./lib/sup/xapian_index.rb:110:in `sync_message'
  from ./lib/sup/util.rb:519:in `send'
  from ./lib/sup/util.rb:519:in `method_missing'
  from ./lib/sup/poll.rb:157:in `add_messages_from'
  from ./lib/sup/source.rb:100:in `each'
  from ./lib/sup/util.rb:558:in `send'
  from ./lib/sup/util.rb:558:in `__pass'
  from ./lib/sup/util.rb:545:in `method_missing'
  from ./lib/sup/poll.rb:141:in `add_messages_from'
  from ./lib/sup/util.rb:519:in `send'
  from ./lib/sup/util.rb:519:in `method_missing'
  from bin/sup-sync:140
  from bin/sup-sync:135:in `each'
  from bin/sup-sync:135

I haven't spent any time tracking it down. Other than that, so far so
good.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-06-24 16:30 ` [sup-talk] [PATCH 0/18] Xapian-based index William Morgan
@ 2009-06-24 17:33   ` William Morgan
  2009-06-26  2:00     ` Olly Betts
  0 siblings, 1 reply; 44+ messages in thread
From: William Morgan @ 2009-06-24 17:33 UTC (permalink / raw)


Reformatted excerpts from William Morgan's message of 2009-06-24:
> sup-sync crashes for me fairly systematically with this error:
> 
> ./lib/sup/xapian_index.rb:404:in `sortable_serialise': Expected argument 0 of
> type double, but got Fixnum 51767811298 (TypeError)

This turns out to be due to dates being far in the future (e.g. on spam
messages). I'm using the attached patch, which is pretty much a hack, to
force them to be between 1969 and 2038. Better solutions welcome. (I
haven't committed this.)
-- 
William <wmorgan-sup at masanjin.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-bugfix-dates-need-to-be-truncated-for-xapian-to-ind.patch
Type: application/octet-stream
Size: 2484 bytes
Desc: not available
URL: <http://rubyforge.org/pipermail/sup-talk/attachments/20090624/fae22ca3/attachment.obj>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-06-24 17:33   ` William Morgan
@ 2009-06-26  2:00     ` Olly Betts
  2009-06-26 13:49       ` William Morgan
  0 siblings, 1 reply; 44+ messages in thread
From: Olly Betts @ 2009-06-26  2:00 UTC (permalink / raw)


William Morgan <wmorgan-sup at masanjin.net> writes:
> Reformatted excerpts from William Morgan's message of 2009-06-24:
> > sup-sync crashes for me fairly systematically with this error:
> > 
> > ./lib/sup/xapian_index.rb:404:in `sortable_serialise': Expected argument 0 of
> > type double, but got Fixnum 51767811298 (TypeError)
> 
> This turns out to be due to dates being far in the future (e.g. on spam
> messages). I'm using the attached patch, which is pretty much a hack, to
> force them to be between 1969 and 2038. Better solutions welcome. (I
> haven't committed this.)

The error you get here is actually a bug in SWIG (http://www.swig.org/).  Xapian
uses SWIG to generate the wrappers for Ruby.

The code SWIG currently uses to convert a parameter when a C/C++ function
takes a double doesn't handle a "fixnum" which is larger than MAXINT.  I've
just applied a fix to SWIG SVN:

http://swig.svn.sourceforge.net/viewvc/swig?view=rev&revision=11320

I'll make sure this fix makes it into the next Xapian release (which will be
1.0.14).

Cheers,
    Olly



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-06-26  2:00     ` Olly Betts
@ 2009-06-26 13:49       ` William Morgan
  2009-07-17 23:42         ` Richard Heycock
  2009-07-28 13:47         ` Olly Betts
  0 siblings, 2 replies; 44+ messages in thread
From: William Morgan @ 2009-06-26 13:49 UTC (permalink / raw)


Reformatted excerpts from Olly Betts's message of 2009-06-25:
> I'll make sure this fix makes it into the next Xapian release (which
> will be 1.0.14).

Awesome, thanks!

Though even with SWIG fixed there will still be some tweaking necessary
in Sup because the logistic function used for generating Xapian docids
still has trouble with extreme dates.

BTW, more kudos to Rich for somehow finding a way to use a logistic
function in an email client.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-06-26 13:49       ` William Morgan
@ 2009-07-17 23:42         ` Richard Heycock
  2009-07-23 10:23           ` Adeodato Simó
  2009-07-28 13:47         ` Olly Betts
  1 sibling, 1 reply; 44+ messages in thread
From: Richard Heycock @ 2009-07-17 23:42 UTC (permalink / raw)


Excerpts from William Morgan's message of Fri Jun 26 23:49:40 +1000 2009:
> Reformatted excerpts from Olly Betts's message of 2009-06-25:
> > I'll make sure this fix makes it into the next Xapian release (which
> > will be 1.0.14).
> 
> Awesome, thanks!
> 
> Though even with SWIG fixed there will still be some tweaking necessary
> in Sup because the logistic function used for generating Xapian docids
> still has trouble with extreme dates.
> 
> BTW, more kudos to Rich for somehow finding a way to use a logistic
> function in an email client.

I've been meaning to respond to this the day this was posted. Rich Lane
thank you, thank you. Ferret was one of by biggest gripes of using sup.
I've used it elsewhere and it's a shocker; I eventually migrated it all
to Xapian which has worked flawlessly since. I used to rebuild by ferret
index almost on a weekly basis (I'm running debian unstable, which at
the moment is really living up to it's name) at one stage something I
haven't had to do once since migrating to Xapian.

I got it work with 1.9 once but there are some problems that I just
haven't had the time to look into but I will do and post any problems to
the list.

rgh


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-17 23:42         ` Richard Heycock
@ 2009-07-23 10:23           ` Adeodato Simó
  2009-07-25  4:53             ` Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Adeodato Simó @ 2009-07-23 10:23 UTC (permalink / raw)


+ Richard Heycock (Sat, 18 Jul 2009 09:42:07 +1000):

> I've been meaning to respond to this the day this was posted. Rich Lane
> thank you, thank you. Ferret was one of by biggest gripes of using sup.
> I've used it elsewhere and it's a shocker; I eventually migrated it all
> to Xapian which has worked flawlessly since. I used to rebuild by ferret
> index almost on a weekly basis (I'm running debian unstable, which at
> the moment is really living up to it's name) at one stage something I
> haven't had to do once since migrating to Xapian.

Yeah, thanks Rich! However, there seems to be something wrong with the
parsing of contacts. After reindexing with Xapian, my contact list has
entries like:

  <dato                                        <dato at net.com.org.esadeodato
  <other                                       <other at foo.ua.esfoo
  dato at net.com.org.esAdeodato Simo             other2 at domain.netother2 surname2

Plus, nor '!label:inbox' or '-label:inbox' work for me. From an
inspection of the code, it doesn't look to me as random negated labels
are being parsed.

Any hints?

-- 
- Are you sure we're good?
- Always.
        -- Rory and Lorelai



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-23 10:23           ` Adeodato Simó
@ 2009-07-25  4:53             ` Rich Lane
  2009-07-25  9:21               ` Adeodato Simó
  2009-07-27 15:46               ` William Morgan
  0 siblings, 2 replies; 44+ messages in thread
From: Rich Lane @ 2009-07-25  4:53 UTC (permalink / raw)


> Yeah, thanks Rich! However, there seems to be something wrong with the
> parsing of contacts. After reindexing with Xapian, my contact list has
> entries like:
> 
>   <dato                                        <dato at net.com.org.esadeodato
>   <other                                       <other at foo.ua.esfoo
>   dato at net.com.org.esAdeodato Simo             other2 at domain.netother2 surname2

Thanks for the bug report, I've posted a patch (fix-mk_addrs-args) to
fix this. You shouldn't need to reindex after applying the patch.

> Plus, nor '!label:inbox' or '-label:inbox' work for me. From an
> inspection of the code, it doesn't look to me as random negated labels
> are being parsed.
> 
> Any hints?

You need to specify a non-negated term in the query.
"type:mail -label:inbox" should work.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-25  4:53             ` Rich Lane
@ 2009-07-25  9:21               ` Adeodato Simó
  2009-07-25 19:59                 ` Rich Lane
  2009-07-27 15:46               ` William Morgan
  1 sibling, 1 reply; 44+ messages in thread
From: Adeodato Simó @ 2009-07-25  9:21 UTC (permalink / raw)


+ Rich Lane (Sat, 25 Jul 2009 06:53:07 +0200):

> Thanks for the bug report, I've posted a patch (fix-mk_addrs-args) to
> fix this. You shouldn't need to reindex after applying the patch.

Great, thanks. The patch indeed fixes the issue.

> > Plus, nor '!label:inbox' or '-label:inbox' work for me. From an
> > inspection of the code, it doesn't look to me as random negated labels
> > are being parsed.

> > Any hints?

> You need to specify a non-negated term in the query.
> "type:mail -label:inbox" should work.

Oh, I see. Yes, that works, thanks.

One extra issue I just noticed: after dumping with ferret, reloading
into Xapian, and doing a dump again (with Xapian this time), all the
messages tagged "deleted" or "spam" do not appear in the dump at all.
Any ideas?

-- 
- Are you sure we're good?
- Always.
        -- Rory and Lorelai



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-25  9:21               ` Adeodato Simó
@ 2009-07-25 19:59                 ` Rich Lane
  2009-07-25 23:28                   ` Ingmar Vanhassel
  2009-07-27 15:48                   ` William Morgan
  0 siblings, 2 replies; 44+ messages in thread
From: Rich Lane @ 2009-07-25 19:59 UTC (permalink / raw)


Excerpts from Adeodato Sim?'s message of Sat Jul 25 05:21:16 -0400 2009:
> One extra issue I just noticed: after dumping with ferret, reloading
> into Xapian, and doing a dump again (with Xapian this time), all the
> messages tagged "deleted" or "spam" do not appear in the dump at all.
> Any ideas?

The patch "xapian: dont exclude spam..." should fix this.

One issue I've noticed is that removing labels from messages doesn't
always immediately work. For example, label-list-mode shows a label as
having some unread messages even though all of them are actually read.
This tends to happen only after sup's been running for a while and
restarting sup fixes it.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-25 19:59                 ` Rich Lane
@ 2009-07-25 23:28                   ` Ingmar Vanhassel
  2009-07-27 15:48                   ` William Morgan
  1 sibling, 0 replies; 44+ messages in thread
From: Ingmar Vanhassel @ 2009-07-25 23:28 UTC (permalink / raw)


Excerpts from Rich Lane's message of Sat Jul 25 21:59:19 +0200 2009:
> One issue I've noticed is that removing labels from messages doesn't
> always immediately work. For example, label-list-mode shows a label as
> having some unread messages even though all of them are actually read.
> This tends to happen only after sup's been running for a while and
> restarting sup fixes it.

I was just about to report that. :)
Besides that, the Xapian index works very nicely. So I'd be happy to see
it in next when that last regression (as far as my testing showed them) is fixed!

-- 
Exherbo KDE, X.org maintainer


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-25  4:53             ` Rich Lane
  2009-07-25  9:21               ` Adeodato Simó
@ 2009-07-27 15:46               ` William Morgan
  2009-07-28 16:53                 ` Olly Betts
  1 sibling, 1 reply; 44+ messages in thread
From: William Morgan @ 2009-07-27 15:46 UTC (permalink / raw)


Reformatted excerpts from Rich Lane's message of 2009-07-24:
> > Plus, nor '!label:inbox' or '-label:inbox' work for me. From an
> > inspection of the code, it doesn't look to me as random negated
> > labels are being parsed.
> > 
> > Any hints?
> 
> You need to specify a non-negated term in the query.  "type:mail
> -label:inbox" should work.

This is a typical restriction for inverted index-based search engines.
You need to have at least one positive term or the computation is too
expensive (it would have to iterate over every term ever seen.) It's
true of Ferret, Google, etc.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-25 19:59                 ` Rich Lane
  2009-07-25 23:28                   ` Ingmar Vanhassel
@ 2009-07-27 15:48                   ` William Morgan
  2009-07-27 16:56                     ` Ingmar Vanhassel
  2009-07-27 17:06                     ` Rich Lane
  1 sibling, 2 replies; 44+ messages in thread
From: William Morgan @ 2009-07-27 15:48 UTC (permalink / raw)


Reformatted excerpts from Rich Lane's message of 2009-07-25:
> One issue I've noticed is that removing labels from messages doesn't
> always immediately work.

Is this true even after you sync changes to the index? What about if you
reload the label list buffer? ('@')
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-27 15:48                   ` William Morgan
@ 2009-07-27 16:56                     ` Ingmar Vanhassel
  2009-09-01  8:07                       ` Ingmar Vanhassel
  2009-07-27 17:06                     ` Rich Lane
  1 sibling, 1 reply; 44+ messages in thread
From: Ingmar Vanhassel @ 2009-07-27 16:56 UTC (permalink / raw)


Excerpts from William Morgan's message of Mon Jul 27 17:48:38 +0200 2009:
> Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > One issue I've noticed is that removing labels from messages doesn't
> > always immediately work.
> 
> Is this true even after you sync changes to the index? What about if you
> reload the label list buffer? ('@')

It's true in both cases. Even after a sync, 'U' still produces read
messages (among unread), and a search for label:foo has threads without
that label. If you quit sup & restart it things work as expected for a
while.

I've also noticed that sup takes a long time to quit with the xapian
index. This delay happens after this message:
[Mon Jul 27 16:56:01 +0000 2009] unlocking /home/ingmar/.sup/lock...

-- 
Exherbo KDE, X.org maintainer


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-27 15:48                   ` William Morgan
  2009-07-27 16:56                     ` Ingmar Vanhassel
@ 2009-07-27 17:06                     ` Rich Lane
  2009-07-31 16:20                       ` Rich Lane
  1 sibling, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-07-27 17:06 UTC (permalink / raw)


Excerpts from William Morgan's message of Mon Jul 27 11:48:38 -0400 2009:
> Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > One issue I've noticed is that removing labels from messages doesn't
> > always immediately work.
> 
> Is this true even after you sync changes to the index? What about if you
> reload the label list buffer? ('@')

Yes. This is looking like a Xapian bug - I've reproduced it without any
Sup code. I'm working on a fix.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-06-26 13:49       ` William Morgan
  2009-07-17 23:42         ` Richard Heycock
@ 2009-07-28 13:47         ` Olly Betts
  2009-07-28 15:07           ` William Morgan
  1 sibling, 1 reply; 44+ messages in thread
From: Olly Betts @ 2009-07-28 13:47 UTC (permalink / raw)


William Morgan <wmorgan-sup at masanjin.net> writes: 
> Reformatted excerpts from Olly Betts's message of 2009-06-25:
> > I'll make sure this fix makes it into the next Xapian release (which
> > will be 1.0.14).
> 
> Awesome, thanks!

Just to update, Xapian 1.0.14 was released last week with this fix.

I tested with a distilled micro-testcase rather than sup and these patches,
so if you still see problems please open a ticket on http://trac.xapian.org/

Cheers,
    Olly



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-28 13:47         ` Olly Betts
@ 2009-07-28 15:07           ` William Morgan
  0 siblings, 0 replies; 44+ messages in thread
From: William Morgan @ 2009-07-28 15:07 UTC (permalink / raw)


Reformatted excerpts from Olly Betts's message of 2009-07-28:
> Just to update, Xapian 1.0.14 was released last week with this fix.
> 
> I tested with a distilled micro-testcase rather than sup and these patches,
> so if you still see problems please open a ticket on http://trac.xapian.org/

Excellent. Thank you.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-27 15:46               ` William Morgan
@ 2009-07-28 16:53                 ` Olly Betts
  2009-07-28 17:01                   ` William Morgan
  0 siblings, 1 reply; 44+ messages in thread
From: Olly Betts @ 2009-07-28 16:53 UTC (permalink / raw)

William Morgan <wmorgan-sup at masanjin.net> writes:
> Reformatted excerpts from Rich Lane's message of 2009-07-24:
> > You need to specify a non-negated term in the query.  "type:mail
> > -label:inbox" should work.
> 
> This is a typical restriction for inverted index-based search engines.
> You need to have at least one positive term or the computation is too
> expensive (it would have to iterate over every term ever seen.) It's
> true of Ferret, Google, etc.

Actually, Xapian supports this - Xapian.Query.new("") is a "magic" query
which matches all documents.

It doesn't need to iterate over every term, just all documents.  But if you
want the top ten documents without a particular filter, there's no relevance
ranking, so it can stop after it has found ten matches, which should be
pretty quick.

This isn't currently supported by the QueryParser when using "-" on terms
(the reasoning was that it was too easy to accidentally invoke when pasting
text), but 'NOT label:inbox' will work if you enable it using
QueryParser.FLAG_PURE_NOT.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-28 16:53                 ` Olly Betts
@ 2009-07-28 17:01                   ` William Morgan
  0 siblings, 0 replies; 44+ messages in thread
From: William Morgan @ 2009-07-28 17:01 UTC (permalink / raw)


Reformatted excerpts from Olly Betts's message of 2009-07-28:
> Actually, Xapian supports this - Xapian.Query.new("") is a "magic"
> query which matches all documents.

Yeah, I think Rich Lane just taught me how Ferret supports this too.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-27 17:06                     ` Rich Lane
@ 2009-07-31 16:20                       ` Rich Lane
  2009-08-12 13:05                         ` Ingmar Vanhassel
  0 siblings, 1 reply; 44+ messages in thread
From: Rich Lane @ 2009-07-31 16:20 UTC (permalink / raw)


Excerpts from Rich Lane's message of Mon Jul 27 13:06:34 -0400 2009:
> Excerpts from William Morgan's message of Mon Jul 27 11:48:38 -0400 2009:
> > Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > > One issue I've noticed is that removing labels from messages doesn't
> > > always immediately work.
> > 
> > Is this true even after you sync changes to the index? What about if you
> > reload the label list buffer? ('@')
> 
> Yes. This is looking like a Xapian bug - I've reproduced it without any
> Sup code. I'm working on a fix.

I've fixed this, it should be released in Xapian 1.0.15. Or, grab Xapian
SVN and you can try out the Chert backend too (XAPIAN_PREFER_CHERT=1). 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-31 16:20                       ` Rich Lane
@ 2009-08-12 13:05                         ` Ingmar Vanhassel
  2009-08-12 14:32                           ` Nicolas Pouillard
  2009-08-14  5:23                           ` Rich Lane
  0 siblings, 2 replies; 44+ messages in thread
From: Ingmar Vanhassel @ 2009-08-12 13:05 UTC (permalink / raw)


Excerpts from Rich Lane's message of Fri Jul 31 18:20:41 +0200 2009:
> Excerpts from Rich Lane's message of Mon Jul 27 13:06:34 -0400 2009:
> > Excerpts from William Morgan's message of Mon Jul 27 11:48:38 -0400 2009:
> > > Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > > > One issue I've noticed is that removing labels from messages doesn't
> > > > always immediately work.
> > > 
> > > Is this true even after you sync changes to the index? What about if you
> > > reload the label list buffer? ('@')
> > 
> > Yes. This is looking like a Xapian bug - I've reproduced it without any
> > Sup code. I'm working on a fix.
> 
> I've fixed this, it should be released in Xapian 1.0.15. Or, grab Xapian
> SVN and you can try out the Chert backend too (XAPIAN_PREFER_CHERT=1).

Could you point me to the SVN revision containing the fix? I'd like to
backport the fix to my Xapian 1.0.14 packages, pending 1.0.15 release.

Thanks!

-- 
Exherbo KDE, X.org maintainer


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-08-12 13:05                         ` Ingmar Vanhassel
@ 2009-08-12 14:32                           ` Nicolas Pouillard
  2009-08-14  5:23                           ` Rich Lane
  1 sibling, 0 replies; 44+ messages in thread
From: Nicolas Pouillard @ 2009-08-12 14:32 UTC (permalink / raw)


Excerpts from Ingmar Vanhassel's message of Wed Aug 12 15:05:35 +0200 2009:
> Excerpts from Rich Lane's message of Fri Jul 31 18:20:41 +0200 2009:
> > Excerpts from Rich Lane's message of Mon Jul 27 13:06:34 -0400 2009:
> > > Excerpts from William Morgan's message of Mon Jul 27 11:48:38 -0400 2009:
> > > > Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > > > > One issue I've noticed is that removing labels from messages doesn't
> > > > > always immediately work.
> > > > 
> > > > Is this true even after you sync changes to the index? What about if you
> > > > reload the label list buffer? ('@')
> > > 
> > > Yes. This is looking like a Xapian bug - I've reproduced it without any
> > > Sup code. I'm working on a fix.
> > 
> > I've fixed this, it should be released in Xapian 1.0.15. Or, grab Xapian
> > SVN and you can try out the Chert backend too (XAPIAN_PREFER_CHERT=1).

BTW, does someone have successfully built Xapian under Mac OS X ?

-- 
Nicolas Pouillard
http://nicolaspouillard.fr


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-08-12 13:05                         ` Ingmar Vanhassel
  2009-08-12 14:32                           ` Nicolas Pouillard
@ 2009-08-14  5:23                           ` Rich Lane
  1 sibling, 0 replies; 44+ messages in thread
From: Rich Lane @ 2009-08-14  5:23 UTC (permalink / raw)


Excerpts from Ingmar Vanhassel's message of Wed Aug 12 09:05:35 -0400 2009:
> Could you point me to the SVN revision containing the fix? I'd like to
> backport the fix to my Xapian 1.0.14 packages, pending 1.0.15 release.

Revision 13219.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-07-27 16:56                     ` Ingmar Vanhassel
@ 2009-09-01  8:07                       ` Ingmar Vanhassel
  2009-09-03 16:52                         ` Rich Lane
  0 siblings, 1 reply; 44+ messages in thread
From: Ingmar Vanhassel @ 2009-09-01  8:07 UTC (permalink / raw)


Excerpts from Ingmar Vanhassel's message of Mon Jul 27 18:56:28 +0200 2009:
> Excerpts from William Morgan's message of Mon Jul 27 17:48:38 +0200 2009:
> > Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > > One issue I've noticed is that removing labels from messages doesn't
> > > always immediately work.
> > 
> > Is this true even after you sync changes to the index? What about if you
> > reload the label list buffer? ('@')
> 
> It's true in both cases. Even after a sync, 'U' still produces read
> messages (among unread), and a search for label:foo has threads without
> that label. If you quit sup & restart it things work as expected for a
> while.

I can still reproduce this for a more specific case, with xapian 1.0.15.

Searching for is:unread (hit U), works as expected. When I filter
that with threads having a second label (hit |, label:foo), then it
shows threads with label:foo, but it loses the is:unread constraint.

Same for immediately doing is:unread label:foo, which gives me unread
threads, but not always with the foo label.
-- 
Exherbo KDE, X.org maintainer


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [sup-talk] [PATCH 0/18] Xapian-based index
  2009-09-01  8:07                       ` Ingmar Vanhassel
@ 2009-09-03 16:52                         ` Rich Lane
  0 siblings, 0 replies; 44+ messages in thread
From: Rich Lane @ 2009-09-03 16:52 UTC (permalink / raw)


Excerpts from Ingmar Vanhassel's message of Tue Sep 01 04:07:27 -0400 2009:
> Excerpts from Ingmar Vanhassel's message of Mon Jul 27 18:56:28 +0200 2009:
> > Excerpts from William Morgan's message of Mon Jul 27 17:48:38 +0200 2009:
> > > Reformatted excerpts from Rich Lane's message of 2009-07-25:
> > > > One issue I've noticed is that removing labels from messages doesn't
> > > > always immediately work.
> > > 
> > > Is this true even after you sync changes to the index? What about if you
> > > reload the label list buffer? ('@')
> > 
> > It's true in both cases. Even after a sync, 'U' still produces read
> > messages (among unread), and a search for label:foo has threads without
> > that label. If you quit sup & restart it things work as expected for a
> > while.
> 
> I can still reproduce this for a more specific case, with xapian 1.0.15.
> 
> Searching for is:unread (hit U), works as expected. When I filter
> that with threads having a second label (hit |, label:foo), then it
> shows threads with label:foo, but it loses the is:unread constraint.
> 
> Same for immediately doing is:unread label:foo, which gives me unread
> threads, but not always with the foo label.

I've reproduced this and it looks like a query parsing problem. Multiple
terms on the same field are OR'd together instead of AND [1]. Adding an
explicit AND works. I'll see if Xapian::QueryParser can be convinced to
do what we want here.

[1] http://trac.xapian.org/ticket/157


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2009-09-03 16:52 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-20 20:49 [sup-talk] [PATCH 0/18] Xapian-based index Rich Lane
2009-06-20 20:50 ` [sup-talk] [PATCH 01/18] remove load_entry_for_id call in sup-recover-sources Rich Lane
2009-06-20 20:50   ` [sup-talk] [PATCH 02/18] remove load_entry_for_id call in DraftManager.discard Rich Lane
2009-06-20 20:50     ` [sup-talk] [PATCH 03/18] remove ferret entry from poll/sync interface Rich Lane
2009-06-20 20:50       ` [sup-talk] [PATCH 04/18] index: remove unused method load_entry_for_id Rich Lane
2009-06-20 20:50         ` [sup-talk] [PATCH 05/18] switch DraftManager to use Message.build_from_source Rich Lane
2009-06-20 20:50           ` [sup-talk] [PATCH 06/18] index: move has_any_from_source_with_label? to sup-sync-back Rich Lane
2009-06-20 20:50             ` [sup-talk] [PATCH 07/18] move source-related methods to SourceManager Rich Lane
2009-06-20 20:50               ` [sup-talk] [PATCH 08/18] index: remove unused method fresh_thread_id Rich Lane
2009-06-20 20:50                 ` [sup-talk] [PATCH 09/18] index: revert overeager opts->query rename in each_message_in_thread_for Rich Lane
2009-06-20 20:50                   ` [sup-talk] [PATCH 10/18] index: make wrap_subj methods private Rich Lane
2009-06-20 20:50                     ` [sup-talk] [PATCH 11/18] index: move Ferret-specific code to ferret_index.rb Rich Lane
2009-06-20 20:50                       ` [sup-talk] [PATCH 12/18] remove last external uses of ferret docid Rich Lane
2009-06-20 20:50                         ` [sup-talk] [PATCH 13/18] add Message.indexable_{body, chunks, subject} Rich Lane
2009-06-20 20:50                           ` [sup-talk] [PATCH 14/18] index: choose index implementation with config entry or environment variable Rich Lane
2009-06-20 20:50                             ` [sup-talk] [PATCH 15/18] index: add xapian implementation Rich Lane
2009-06-20 20:50                               ` [sup-talk] [PATCH 16/18] fix String#ord monkeypatch Rich Lane
2009-06-20 20:50                                 ` [sup-talk] [PATCH 17/18] add limit argument to author_names_and_newness_for_thread Rich Lane
2009-06-20 20:50                                   ` [sup-talk] [PATCH 18/18] dont using SavingHash#[] for membership test Rich Lane
2009-06-22 14:46                                     ` Andrei Thorp
2009-06-24 16:30 ` [sup-talk] [PATCH 0/18] Xapian-based index William Morgan
2009-06-24 17:33   ` William Morgan
2009-06-26  2:00     ` Olly Betts
2009-06-26 13:49       ` William Morgan
2009-07-17 23:42         ` Richard Heycock
2009-07-23 10:23           ` Adeodato Simó
2009-07-25  4:53             ` Rich Lane
2009-07-25  9:21               ` Adeodato Simó
2009-07-25 19:59                 ` Rich Lane
2009-07-25 23:28                   ` Ingmar Vanhassel
2009-07-27 15:48                   ` William Morgan
2009-07-27 16:56                     ` Ingmar Vanhassel
2009-09-01  8:07                       ` Ingmar Vanhassel
2009-09-03 16:52                         ` Rich Lane
2009-07-27 17:06                     ` Rich Lane
2009-07-31 16:20                       ` Rich Lane
2009-08-12 13:05                         ` Ingmar Vanhassel
2009-08-12 14:32                           ` Nicolas Pouillard
2009-08-14  5:23                           ` Rich Lane
2009-07-27 15:46               ` William Morgan
2009-07-28 16:53                 ` Olly Betts
2009-07-28 17:01                   ` William Morgan
2009-07-28 13:47         ` Olly Betts
2009-07-28 15:07           ` William Morgan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox