Archive of RubyForge sup-devel mailing list
 help / color / mirror / Atom feed
* [sup-devel] [PATCH 1/4] dont index redundant data
@ 2010-01-16 20:03 Rich Lane
  2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
  2010-01-23 12:49 ` [sup-devel] [PATCH 1/4] dont index redundant data William Morgan
  0 siblings, 2 replies; 5+ messages in thread
From: Rich Lane @ 2010-01-16 20:03 UTC (permalink / raw)
  To: sup-devel

Use the Xapian QueryParser ability to map a prefix in the query to multiple
prefixes in the index. This means we don't need to store duplicate names, email
addresses, and subjects. This also adds the stemmed attachment filenames to the
default (non-prefixed) search. We're storing a subset of the data previous
versions did, so we're able to read them but they can't read us.
---
 lib/sup/xapian_index.rb |   19 ++++++++-----------
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index c81dca4..c0b2f9f 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -258,9 +258,9 @@ EOS
     qp.stemming_strategy = Xapian::QueryParser::STEM_SOME
     qp.default_op = Xapian::Query::OP_AND
     qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(DATE_VALUENO, 'date:', true))
-    NORMAL_PREFIX.each { |k,v| qp.add_prefix k, v }
-    BOOLEAN_PREFIX.each { |k,v| qp.add_boolean_prefix k, v }
-    xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD, PREFIX['body'])
+    NORMAL_PREFIX.each { |k,vs| vs.each { |v| qp.add_prefix k, v } }
+    BOOLEAN_PREFIX.each { |k,vs| vs.each { |v| qp.add_boolean_prefix k, v } }
+    xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD)
 
     raise ParseError if xapian_query.nil? or xapian_query.empty?
     query[:qobj] = xapian_query
@@ -276,8 +276,9 @@ EOS
     'body' => 'B',
     'from_name' => 'FN',
     'to_name' => 'TN',
-    'name' => 'N',
+    'name' => %w(FN TN),
     'attachment' => 'A',
+    '' => %w(S B FN TN A),
   }
 
   # Unstemmed
@@ -285,7 +286,7 @@ EOS
     'type' => 'K',
     'from_email' => 'FE',
     'to_email' => 'TE',
-    'email' => 'E',
+    'email' => %w(FE TE),
     'date' => 'D',
     'label' => 'L',
     'source_id' => 'I',
@@ -457,10 +458,8 @@ EOS
     # Person names are indexed with several prefixes
     person_termer = lambda do |d|
       lambda do |p|
-        ["#{d}_name", "name", "body"].each do |x|
-          doc.index_text p.name, PREFIX[x]
-        end if p.name
-        [d, :any].each { |x| doc.add_term mkterm(:email, x, p.email) }
+        doc.index_text p.name, PREFIX["#{d}_name"] if p.name
+        doc.add_term mkterm(:email, d, p.email)
       end
     end
 
@@ -471,7 +470,6 @@ EOS
     subject_text = m.indexable_subject
     body_text = m.indexable_body
     doc.index_text subject_text, PREFIX['subject']
-    doc.index_text subject_text, PREFIX['body']
     doc.index_text body_text, PREFIX['body']
     m.attachments.each { |a| doc.index_text a, PREFIX['attachment'] }
 
@@ -554,7 +552,6 @@ EOS
       case args[0]
       when :from then PREFIX['from_email']
       when :to then PREFIX['to_email']
-      when :any then PREFIX['email']
       else raise "Invalid email term type #{args[0]}"
       end + args[1].to_s.downcase
     when :source_id
-- 
1.6.5.2

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-23 12:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-16 20:03 [sup-devel] [PATCH 1/4] dont index redundant data Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
2010-01-16 20:03   ` [sup-devel] [PATCH 3/4] id query prefix synonym for msgid Rich Lane
2010-01-16 20:03     ` [sup-devel] [PATCH 4/4] trivial index format upgrade Rich Lane
2010-01-23 12:49 ` [sup-devel] [PATCH 1/4] dont index redundant data William Morgan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox