* [sup-devel] [PATCH 1/4] dont index redundant data
@ 2010-01-16 20:03 Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
2010-01-23 12:49 ` [sup-devel] [PATCH 1/4] dont index redundant data William Morgan
0 siblings, 2 replies; 5+ messages in thread
From: Rich Lane @ 2010-01-16 20:03 UTC (permalink / raw)
To: sup-devel
Use the Xapian QueryParser ability to map a prefix in the query to multiple
prefixes in the index. This means we don't need to store duplicate names, email
addresses, and subjects. This also adds the stemmed attachment filenames to the
default (non-prefixed) search. We're storing a subset of the data previous
versions did, so we're able to read them but they can't read us.
---
lib/sup/xapian_index.rb | 19 ++++++++-----------
1 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index c81dca4..c0b2f9f 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -258,9 +258,9 @@ EOS
qp.stemming_strategy = Xapian::QueryParser::STEM_SOME
qp.default_op = Xapian::Query::OP_AND
qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(DATE_VALUENO, 'date:', true))
- NORMAL_PREFIX.each { |k,v| qp.add_prefix k, v }
- BOOLEAN_PREFIX.each { |k,v| qp.add_boolean_prefix k, v }
- xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD, PREFIX['body'])
+ NORMAL_PREFIX.each { |k,vs| vs.each { |v| qp.add_prefix k, v } }
+ BOOLEAN_PREFIX.each { |k,vs| vs.each { |v| qp.add_boolean_prefix k, v } }
+ xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD)
raise ParseError if xapian_query.nil? or xapian_query.empty?
query[:qobj] = xapian_query
@@ -276,8 +276,9 @@ EOS
'body' => 'B',
'from_name' => 'FN',
'to_name' => 'TN',
- 'name' => 'N',
+ 'name' => %w(FN TN),
'attachment' => 'A',
+ '' => %w(S B FN TN A),
}
# Unstemmed
@@ -285,7 +286,7 @@ EOS
'type' => 'K',
'from_email' => 'FE',
'to_email' => 'TE',
- 'email' => 'E',
+ 'email' => %w(FE TE),
'date' => 'D',
'label' => 'L',
'source_id' => 'I',
@@ -457,10 +458,8 @@ EOS
# Person names are indexed with several prefixes
person_termer = lambda do |d|
lambda do |p|
- ["#{d}_name", "name", "body"].each do |x|
- doc.index_text p.name, PREFIX[x]
- end if p.name
- [d, :any].each { |x| doc.add_term mkterm(:email, x, p.email) }
+ doc.index_text p.name, PREFIX["#{d}_name"] if p.name
+ doc.add_term mkterm(:email, d, p.email)
end
end
@@ -471,7 +470,6 @@ EOS
subject_text = m.indexable_subject
body_text = m.indexable_body
doc.index_text subject_text, PREFIX['subject']
- doc.index_text subject_text, PREFIX['body']
doc.index_text body_text, PREFIX['body']
m.attachments.each { |a| doc.index_text a, PREFIX['attachment'] }
@@ -554,7 +552,6 @@ EOS
case args[0]
when :from then PREFIX['from_email']
when :to then PREFIX['to_email']
- when :any then PREFIX['email']
else raise "Invalid email term type #{args[0]}"
end + args[1].to_s.downcase
when :source_id
--
1.6.5.2
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-devel] [PATCH 2/4] index email addresses as text
2010-01-16 20:03 [sup-devel] [PATCH 1/4] dont index redundant data Rich Lane
@ 2010-01-16 20:03 ` Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 3/4] id query prefix synonym for msgid Rich Lane
2010-01-23 12:49 ` [sup-devel] [PATCH 1/4] dont index redundant data William Morgan
1 sibling, 1 reply; 5+ messages in thread
From: Rich Lane @ 2010-01-16 20:03 UTC (permalink / raw)
To: sup-devel
This lets you search for an email address (or its component parts, since it's
indexed as a phrase) with no prefix.
---
lib/sup/xapian_index.rb | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index c0b2f9f..6fa6c55 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -278,7 +278,8 @@ EOS
'to_name' => 'TN',
'name' => %w(FN TN),
'attachment' => 'A',
- '' => %w(S B FN TN A),
+ 'email_text' => 'E',
+ '' => %w(S B FN TN A E),
}
# Unstemmed
@@ -459,6 +460,7 @@ EOS
person_termer = lambda do |d|
lambda do |p|
doc.index_text p.name, PREFIX["#{d}_name"] if p.name
+ doc.index_text p.email, PREFIX['email_text']
doc.add_term mkterm(:email, d, p.email)
end
end
--
1.6.5.2
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-devel] [PATCH 3/4] id query prefix synonym for msgid
2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
@ 2010-01-16 20:03 ` Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 4/4] trivial index format upgrade Rich Lane
0 siblings, 1 reply; 5+ messages in thread
From: Rich Lane @ 2010-01-16 20:03 UTC (permalink / raw)
To: sup-devel
notmuch has created an "id:<msgid>" convention for referring to emails.
We already had "msgid:<msgid>", but support this syntax too.
---
lib/sup/xapian_index.rb | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index 6fa6c55..eefd492 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -293,6 +293,7 @@ EOS
'source_id' => 'I',
'attachment_extension' => 'O',
'msgid' => 'Q',
+ 'id' => 'Q',
'thread' => 'H',
'ref' => 'R',
}
--
1.6.5.2
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [sup-devel] [PATCH 4/4] trivial index format upgrade
2010-01-16 20:03 ` [sup-devel] [PATCH 3/4] id query prefix synonym for msgid Rich Lane
@ 2010-01-16 20:03 ` Rich Lane
0 siblings, 0 replies; 5+ messages in thread
From: Rich Lane @ 2010-01-16 20:03 UTC (permalink / raw)
To: sup-devel
A v2 client can read a v1 index, but a v1 client cannot read a v2 index. Once
the v2 client modifies the index the v1 client will be unable to read it. So,
make the version check match that.
---
lib/sup/xapian_index.rb | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index eefd492..464cee1 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -8,7 +8,7 @@ module Redwood
# for searching due to precomputing thread membership.
class XapianIndex < BaseIndex
STEM_LANGUAGE = "english"
- INDEX_VERSION = '1'
+ INDEX_VERSION = '2'
## dates are converted to integers for xapian, and are used for document ids,
## so we must ensure they're reasonably valid. this typically only affect
@@ -35,7 +35,10 @@ EOS
@xapian = Xapian::WritableDatabase.new(path, Xapian::DB_OPEN)
db_version = @xapian.get_metadata 'version'
db_version = '0' if db_version.empty?
- if db_version != INDEX_VERSION
+ if db_version == '1'
+ info "Upgrading index format 1 to 2"
+ @xapian.set_metadata 'version', INDEX_VERSION
+ elsif db_version != INDEX_VERSION
fail "This Sup version expects a v#{INDEX_VERSION} index, but you have an existing v#{db_version} index. Please downgrade to your previous version and dump your labels before upgrading to this version (then run sup-sync --restore)."
end
else
--
1.6.5.2
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sup-devel] [PATCH 1/4] dont index redundant data
2010-01-16 20:03 [sup-devel] [PATCH 1/4] dont index redundant data Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
@ 2010-01-23 12:49 ` William Morgan
1 sibling, 0 replies; 5+ messages in thread
From: William Morgan @ 2010-01-23 12:49 UTC (permalink / raw)
To: sup-devel
Branch xapian-updates, merged into next.
--
William <wmorgan-sup@masanjin.net>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-01-23 12:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-16 20:03 [sup-devel] [PATCH 1/4] dont index redundant data Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 2/4] index email addresses as text Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 3/4] id query prefix synonym for msgid Rich Lane
2010-01-16 20:03 ` [sup-devel] [PATCH 4/4] trivial index format upgrade Rich Lane
2010-01-23 12:49 ` [sup-devel] [PATCH 1/4] dont index redundant data William Morgan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox