From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 10.213.7.146 with SMTP id d18cs291470ebd; Sat, 16 Jan 2010 12:08:08 -0800 (PST) Received: by 10.224.65.37 with SMTP id g37mr1627191qai.101.1263672488235; Sat, 16 Jan 2010 12:08:08 -0800 (PST) Return-Path: Received: from rubyforge.org (rubyforge.org [205.234.109.19]) by mx.google.com with ESMTP id 6si8079662qwk.31.2010.01.16.12.08.07; Sat, 16 Jan 2010 12:08:08 -0800 (PST) Received-SPF: pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) client-ip=205.234.109.19; Authentication-Results: mx.google.com; spf=pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) smtp.mail=sup-devel-bounces@rubyforge.org Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id AD8571779940; Sat, 16 Jan 2010 15:08:07 -0500 (EST) Received: from magnesium.club.cc.cmu.edu (MAGNESIUM.CLUB.CC.cmu.edu [128.237.157.15]) by rubyforge.org (Postfix) with ESMTP id 1412318582D0 for ; Sat, 16 Jan 2010 15:03:50 -0500 (EST) Received: (qmail 27797 invoked from network); 16 Jan 2010 20:03:50 -0000 Received: from pion.club.cc.cmu.edu (HELO localhost.localdomain) (128.237.157.88) by magnesium.club.cc.cmu.edu with SMTP; 16 Jan 2010 20:03:50 -0000 From: Rich Lane To: sup-devel@rubyforge.org Date: Sat, 16 Jan 2010 12:03:04 -0800 Message-Id: <1263672187-5174-1-git-send-email-rlane@club.cc.cmu.edu> X-Mailer: git-send-email 1.6.5.2 Subject: [sup-devel] [PATCH 1/4] dont index redundant data X-BeenThere: sup-devel@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: Sup developer discussion List-Id: Sup developer discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: sup-devel-bounces@rubyforge.org Errors-To: sup-devel-bounces@rubyforge.org Use the Xapian QueryParser ability to map a prefix in the query to multiple prefixes in the index. This means we don't need to store duplicate names, email addresses, and subjects. This also adds the stemmed attachment filenames to the default (non-prefixed) search. We're storing a subset of the data previous versions did, so we're able to read them but they can't read us. --- lib/sup/xapian_index.rb | 19 ++++++++----------- 1 files changed, 8 insertions(+), 11 deletions(-) diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb index c81dca4..c0b2f9f 100644 --- a/lib/sup/xapian_index.rb +++ b/lib/sup/xapian_index.rb @@ -258,9 +258,9 @@ EOS qp.stemming_strategy = Xapian::QueryParser::STEM_SOME qp.default_op = Xapian::Query::OP_AND qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(DATE_VALUENO, 'date:', true)) - NORMAL_PREFIX.each { |k,v| qp.add_prefix k, v } - BOOLEAN_PREFIX.each { |k,v| qp.add_boolean_prefix k, v } - xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD, PREFIX['body']) + NORMAL_PREFIX.each { |k,vs| vs.each { |v| qp.add_prefix k, v } } + BOOLEAN_PREFIX.each { |k,vs| vs.each { |v| qp.add_boolean_prefix k, v } } + xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD) raise ParseError if xapian_query.nil? or xapian_query.empty? query[:qobj] = xapian_query @@ -276,8 +276,9 @@ EOS 'body' => 'B', 'from_name' => 'FN', 'to_name' => 'TN', - 'name' => 'N', + 'name' => %w(FN TN), 'attachment' => 'A', + '' => %w(S B FN TN A), } # Unstemmed @@ -285,7 +286,7 @@ EOS 'type' => 'K', 'from_email' => 'FE', 'to_email' => 'TE', - 'email' => 'E', + 'email' => %w(FE TE), 'date' => 'D', 'label' => 'L', 'source_id' => 'I', @@ -457,10 +458,8 @@ EOS # Person names are indexed with several prefixes person_termer = lambda do |d| lambda do |p| - ["#{d}_name", "name", "body"].each do |x| - doc.index_text p.name, PREFIX[x] - end if p.name - [d, :any].each { |x| doc.add_term mkterm(:email, x, p.email) } + doc.index_text p.name, PREFIX["#{d}_name"] if p.name + doc.add_term mkterm(:email, d, p.email) end end @@ -471,7 +470,6 @@ EOS subject_text = m.indexable_subject body_text = m.indexable_body doc.index_text subject_text, PREFIX['subject'] - doc.index_text subject_text, PREFIX['body'] doc.index_text body_text, PREFIX['body'] m.attachments.each { |a| doc.index_text a, PREFIX['attachment'] } @@ -554,7 +552,6 @@ EOS case args[0] when :from then PREFIX['from_email'] when :to then PREFIX['to_email'] - when :any then PREFIX['email'] else raise "Invalid email term type #{args[0]}" end + args[1].to_s.downcase when :source_id -- 1.6.5.2 _______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel