From: Rich Lane <rlane@club.cc.cmu.edu>
To: sup-devel@rubyforge.org
Subject: [sup-devel] [PATCH 1/4] dont index redundant data
Date: Sat, 16 Jan 2010 12:03:04 -0800 [thread overview]
Message-ID: <1263672187-5174-1-git-send-email-rlane@club.cc.cmu.edu> (raw)
Use the Xapian QueryParser ability to map a prefix in the query to multiple
prefixes in the index. This means we don't need to store duplicate names, email
addresses, and subjects. This also adds the stemmed attachment filenames to the
default (non-prefixed) search. We're storing a subset of the data previous
versions did, so we're able to read them but they can't read us.
---
lib/sup/xapian_index.rb | 19 ++++++++-----------
1 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index c81dca4..c0b2f9f 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -258,9 +258,9 @@ EOS
qp.stemming_strategy = Xapian::QueryParser::STEM_SOME
qp.default_op = Xapian::Query::OP_AND
qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(DATE_VALUENO, 'date:', true))
- NORMAL_PREFIX.each { |k,v| qp.add_prefix k, v }
- BOOLEAN_PREFIX.each { |k,v| qp.add_boolean_prefix k, v }
- xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD, PREFIX['body'])
+ NORMAL_PREFIX.each { |k,vs| vs.each { |v| qp.add_prefix k, v } }
+ BOOLEAN_PREFIX.each { |k,vs| vs.each { |v| qp.add_boolean_prefix k, v } }
+ xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD)
raise ParseError if xapian_query.nil? or xapian_query.empty?
query[:qobj] = xapian_query
@@ -276,8 +276,9 @@ EOS
'body' => 'B',
'from_name' => 'FN',
'to_name' => 'TN',
- 'name' => 'N',
+ 'name' => %w(FN TN),
'attachment' => 'A',
+ '' => %w(S B FN TN A),
}
# Unstemmed
@@ -285,7 +286,7 @@ EOS
'type' => 'K',
'from_email' => 'FE',
'to_email' => 'TE',
- 'email' => 'E',
+ 'email' => %w(FE TE),
'date' => 'D',
'label' => 'L',
'source_id' => 'I',
@@ -457,10 +458,8 @@ EOS
# Person names are indexed with several prefixes
person_termer = lambda do |d|
lambda do |p|
- ["#{d}_name", "name", "body"].each do |x|
- doc.index_text p.name, PREFIX[x]
- end if p.name
- [d, :any].each { |x| doc.add_term mkterm(:email, x, p.email) }
+ doc.index_text p.name, PREFIX["#{d}_name"] if p.name
+ doc.add_term mkterm(:email, d, p.email)
end
end
@@ -471,7 +470,6 @@ EOS
subject_text = m.indexable_subject
body_text = m.indexable_body
doc.index_text subject_text, PREFIX['subject']
- doc.index_text subject_text, PREFIX['body']
doc.index_text body_text, PREFIX['body']
m.attachments.each { |a| doc.index_text a, PREFIX['attachment'] }
@@ -554,7 +552,6 @@ EOS
case args[0]
when :from then PREFIX['from_email']
when :to then PREFIX['to_email']
- when :any then PREFIX['email']
else raise "Invalid email term type #{args[0]}"
end + args[1].to_s.downcase
when :source_id
--
1.6.5.2
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
next reply other threads:[~2010-01-16 20:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-16 20:03 Rich Lane [this message]
2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 3/4] id query prefix synonym for msgid Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 4/4] trivial index format upgrade Rich Lane
2010-01-23 12:49 ` [sup-devel] [PATCH 1/4] dont index redundant data William Morgan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1263672187-5174-1-git-send-email-rlane@club.cc.cmu.edu \
--to=rlane@club.cc.cmu.edu \
--cc=sup-devel@rubyforge.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox