* [sup-devel] [PATCH] fix handling of multiple label: terms in search
@ 2010-09-29 14:16 Sascha Silbe
2010-09-29 15:27 ` Edward Z. Yang
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Sascha Silbe @ 2010-09-29 14:16 UTC (permalink / raw)
To: sup-devel
By default Xapian will join query terms with the same prefix with OR instead
of AND, so searching for multiple labels doesn't return the expected results.
By making use of a parameter to add_boolean_prefix (added in Xapian 1.2) we
can tell Xapian to use OR only for the search terms that are guaranteed to be
unique.
Signed-off-by: Sascha Silbe <sascha-pgp@silbe.org>
---
lib/sup/index.rb | 74 +++++++++++++++++++++++++++---------------------------
1 files changed, 37 insertions(+), 37 deletions(-)
Tested on Debian Squeeze with Ruby 1.8.7.302 and Xapian 1.2.3.
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index 9273f18..a72bec6 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -419,8 +419,8 @@ EOS
qp.stemming_strategy = Xapian::QueryParser::STEM_SOME
qp.default_op = Xapian::Query::OP_AND
qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(DATE_VALUENO, 'date:', true))
- NORMAL_PREFIX.each { |k,vs| vs.each { |v| qp.add_prefix k, v } }
- BOOLEAN_PREFIX.each { |k,vs| vs.each { |v| qp.add_boolean_prefix k, v } }
+ NORMAL_PREFIX.each { |k,info| info[:prefix].each { |v| qp.add_prefix k, v } }
+ BOOLEAN_PREFIX.each { |k,info| info[:prefix].each { |v| qp.add_boolean_prefix k, v, info[:exclusive] } }
begin
xapian_query = qp.parse_query(subs, Xapian::QueryParser::FLAG_PHRASE|Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_WILDCARD)
@@ -471,31 +471,31 @@ EOS
# Stemmed
NORMAL_PREFIX = {
- 'subject' => 'S',
- 'body' => 'B',
- 'from_name' => 'FN',
- 'to_name' => 'TN',
- 'name' => %w(FN TN),
- 'attachment' => 'A',
- 'email_text' => 'E',
- '' => %w(S B FN TN A E),
+ 'subject' => {:prefix => 'S', :exclusive => false},
+ 'body' => {:prefix => 'B', :exclusive => false},
+ 'from_name' => {:prefix => 'FN', :exclusive => false},
+ 'to_name' => {:prefix => 'TN', :exclusive => false},
+ 'name' => {:prefix => %w(FN TN), :exclusive => false},
+ 'attachment' => {:prefix => 'A', :exclusive => false},
+ 'email_text' => {:prefix => 'E', :exclusive => false},
+ '' => {:prefix => %w(S B FN TN A E), :exclusive => false},
}
# Unstemmed
BOOLEAN_PREFIX = {
- 'type' => 'K',
- 'from_email' => 'FE',
- 'to_email' => 'TE',
- 'email' => %w(FE TE),
- 'date' => 'D',
- 'label' => 'L',
- 'source_id' => 'I',
- 'attachment_extension' => 'O',
- 'msgid' => 'Q',
- 'id' => 'Q',
- 'thread' => 'H',
- 'ref' => 'R',
- 'location' => 'J',
+ 'type' => {:prefix => 'K', :exclusive => true},
+ 'from_email' => {:prefix => 'FE', :exclusive => false},
+ 'to_email' => {:prefix => 'TE', :exclusive => false},
+ 'email' => {:prefix => %w(FE TE), :exclusive => false},
+ 'date' => {:prefix => 'D', :exclusive => true},
+ 'label' => {:prefix => 'L', :exclusive => false},
+ 'source_id' => {:prefix => 'I', :exclusive => true},
+ 'attachment_extension' => {:prefix => 'O', :exclusive => false},
+ 'msgid' => {:prefix => 'Q', :exclusive => true},
+ 'id' => {:prefix => 'Q', :exclusive => true},
+ 'thread' => {:prefix => 'H', :exclusive => false},
+ 'ref' => {:prefix => 'R', :exclusive => false},
+ 'location' => {:prefix => 'J', :exclusive => false},
}
PREFIX = NORMAL_PREFIX.merge BOOLEAN_PREFIX
@@ -661,8 +661,8 @@ EOS
# Person names are indexed with several prefixes
person_termer = lambda do |d|
lambda do |p|
- doc.index_text p.name, PREFIX["#{d}_name"] if p.name
- doc.index_text p.email, PREFIX['email_text']
+ doc.index_text p.name, PREFIX["#{d}_name"][:prefix] if p.name
+ doc.index_text p.email, PREFIX['email_text'][:prefix]
doc.add_term mkterm(:email, d, p.email)
end
end
@@ -673,9 +673,9 @@ EOS
# Full text search content
subject_text = m.indexable_subject
body_text = m.indexable_body
- doc.index_text subject_text, PREFIX['subject']
- doc.index_text body_text, PREFIX['body']
- m.attachments.each { |a| doc.index_text a, PREFIX['attachment'] }
+ doc.index_text subject_text, PREFIX['subject'][:prefix]
+ doc.index_text body_text, PREFIX['body'][:prefix]
+ m.attachments.each { |a| doc.index_text a, PREFIX['attachment'][:prefix] }
# Miscellaneous terms
doc.add_term mkterm(:date, m.date) if m.date
@@ -753,25 +753,25 @@ EOS
def mkterm type, *args
case type
when :label
- PREFIX['label'] + args[0].to_s.downcase
+ PREFIX['label'][:prefix] + args[0].to_s.downcase
when :type
- PREFIX['type'] + args[0].to_s.downcase
+ PREFIX['type'][:prefix] + args[0].to_s.downcase
when :date
- PREFIX['date'] + args[0].getutc.strftime("%Y%m%d%H%M%S")
+ PREFIX['date'][:prefix] + args[0].getutc.strftime("%Y%m%d%H%M%S")
when :email
case args[0]
- when :from then PREFIX['from_email']
- when :to then PREFIX['to_email']
+ when :from then PREFIX['from_email'][:prefix]
+ when :to then PREFIX['to_email'][:prefix]
else raise "Invalid email term type #{args[0]}"
end + args[1].to_s.downcase
when :source_id
- PREFIX['source_id'] + args[0].to_s.downcase
+ PREFIX['source_id'][:prefix] + args[0].to_s.downcase
when :location
- PREFIX['location'] + [args[0]].pack('n') + args[1].to_s
+ PREFIX['location'][:prefix] + [args[0]].pack('n') + args[1].to_s
when :attachment_extension
- PREFIX['attachment_extension'] + args[0].to_s.downcase
+ PREFIX['attachment_extension'][:prefix] + args[0].to_s.downcase
when :msgid, :ref, :thread
- PREFIX[type.to_s] + args[0][0...(MAX_TERM_LENGTH-1)]
+ PREFIX[type.to_s][:prefix] + args[0][0...(MAX_TERM_LENGTH-1)]
else
raise "Invalid term type #{type}"
end
--
1.7.1
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sup-devel] [PATCH] fix handling of multiple label: terms in search
2010-09-29 14:16 [sup-devel] [PATCH] fix handling of multiple label: terms in search Sascha Silbe
@ 2010-09-29 15:27 ` Edward Z. Yang
2010-10-08 4:30 ` Rich Lane
2011-01-17 6:20 ` Rich Lane
2 siblings, 0 replies; 5+ messages in thread
From: Edward Z. Yang @ 2010-09-29 15:27 UTC (permalink / raw)
To: Sascha Silbe; +Cc: sup-devel
Excerpts from Sascha Silbe's message of Wed Sep 29 10:16:02 -0400 2010:
> By default Xapian will join query terms with the same prefix with OR instead
> of AND, so searching for multiple labels doesn't return the expected results.
> By making use of a parameter to add_boolean_prefix (added in Xapian 1.2) we
> can tell Xapian to use OR only for the search terms that are guaranteed to be
> unique.
This is great, I'd love to see this go into mainline.
Cheers,
Edward
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sup-devel] [PATCH] fix handling of multiple label: terms in search
2010-09-29 14:16 [sup-devel] [PATCH] fix handling of multiple label: terms in search Sascha Silbe
2010-09-29 15:27 ` Edward Z. Yang
@ 2010-10-08 4:30 ` Rich Lane
2011-01-17 6:20 ` Rich Lane
2 siblings, 0 replies; 5+ messages in thread
From: Rich Lane @ 2010-10-08 4:30 UTC (permalink / raw)
To: Sascha Silbe; +Cc: sup-devel
I like this patch, but I'm going to hold off on merging it to master
until my machines are all running Xapian 1.2. I've applied it to the
new and-labels branch.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sup-devel] [PATCH] fix handling of multiple label: terms in search
2010-09-29 14:16 [sup-devel] [PATCH] fix handling of multiple label: terms in search Sascha Silbe
2010-09-29 15:27 ` Edward Z. Yang
2010-10-08 4:30 ` Rich Lane
@ 2011-01-17 6:20 ` Rich Lane
2011-01-18 18:11 ` Sascha Silbe
2 siblings, 1 reply; 5+ messages in thread
From: Rich Lane @ 2011-01-17 6:20 UTC (permalink / raw)
To: Sascha Silbe; +Cc: sup-devel
Excerpts from Sascha Silbe's message of Wed Sep 29 10:16:02 -0400 2010:
> By default Xapian will join query terms with the same prefix with OR instead
> of AND, so searching for multiple labels doesn't return the expected results.
> By making use of a parameter to add_boolean_prefix (added in Xapian 1.2) we
> can tell Xapian to use OR only for the search terms that are guaranteed to be
> unique.
Merged to master. This means we require Xapian 1.2.1 now. The
xapian-full gem has been updated to 1.2.3.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sup-devel] [PATCH] fix handling of multiple label: terms in search
2011-01-17 6:20 ` Rich Lane
@ 2011-01-18 18:11 ` Sascha Silbe
0 siblings, 0 replies; 5+ messages in thread
From: Sascha Silbe @ 2011-01-18 18:11 UTC (permalink / raw)
To: Rich Lane; +Cc: sup-devel
[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]
Excerpts from Rich Lane's message of Mon Jan 17 07:20:08 +0100 2011:
> Excerpts from Sascha Silbe's message of Wed Sep 29 10:16:02 -0400 2010:
> > By default Xapian will join query terms with the same prefix with OR instead
> > of AND, so searching for multiple labels doesn't return the expected results.
> > By making use of a parameter to add_boolean_prefix (added in Xapian 1.2) we
> > can tell Xapian to use OR only for the search terms that are guaranteed to be
> > unique.
>
> Merged to master.
Thanks! I've rebased my branch on top of next and will submit another
round of patches that should be ready for inclusion.
Sascha
--
http://sascha.silbe.org/
http://www.infra-silbe.de/
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 494 bytes --]
[-- Attachment #2: Type: text/plain, Size: 143 bytes --]
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-18 19:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-29 14:16 [sup-devel] [PATCH] fix handling of multiple label: terms in search Sascha Silbe
2010-09-29 15:27 ` Edward Z. Yang
2010-10-08 4:30 ` Rich Lane
2011-01-17 6:20 ` Rich Lane
2011-01-18 18:11 ` Sascha Silbe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox