* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
@ 2008-02-25 20:50 Marcus Williams
2008-02-28 17:40 ` William Morgan
0 siblings, 1 reply; 9+ messages in thread
From: Marcus Williams @ 2008-02-25 20:50 UTC (permalink / raw)
This patch adds the search terms "filename" and "filetype". This changes
the index so requires a sup-sync --all to work properly, but should work
on all new messages without it. You can now search for something like
"from:phil* filetype:pdf" for all messages from a person called phil
with a pdf attachment. You can also specify a file name for the
attachment with "filename:(this is a filename with spaces.txt)". You
can use wildcards in the filename ("filename:test*.pdf").
---
lib/sup/index.rb | 15 +++++++++++++++
lib/sup/message.rb | 7 ++++++-
2 files changed, 21 insertions(+), 1 deletions(-)
diff --git a/lib/sup/index.rb b/lib/sup/index.rb
index f812fc7..4205f2a 100644
--- a/lib/sup/index.rb
+++ b/lib/sup/index.rb
@@ -147,6 +147,7 @@ EOS
field_infos.add_field :date, :index => :untokenized
field_infos.add_field :body
field_infos.add_field :label
+ field_infos.add_field :attachments
field_infos.add_field :subject
field_infos.add_field :from
field_infos.add_field :to
@@ -198,6 +199,7 @@ EOS
:body => (entry[:body] || m.indexable_content),
:snippet => snippet, # always override
:label => m.labels.uniq.join(" "), # always override
+ :attachments => (entry[:attachments] || m.attachments.uniq.join(" ")),
:from => (entry[:from] || (m.from ? m.from.indexable_content : "")),
:to => (entry[:to] || (m.to + m.cc + m.bcc).map { |x| x.indexable_content }.join(" ")),
:subject => (entry[:subject] || wrap_subj(m.subj)),
@@ -452,6 +454,19 @@ protected
end
end
+ ## gmail style attachments "filename" and "filetype" searches
+ subs = subs.gsub(/\b(filename|filetype):(\((.+?)\)\B|(\S+)\b)/) do
+ field, name = $1, ($3 || $4)
+ case field
+ when "filename"
+ Redwood::log "filename - translated #{field}:#{name} to attachments:(#{name.downcase})"
+ "attachments:(#{name.downcase})"
+ when "filetype"
+ Redwood::log "filetype - translated #{field}:#{name} to attachments:(*.#{name.downcase})"
+ "attachments:(*.#{name.downcase})"
+ end
+ end
+
if $have_chronic
chronic_failure = false
subs = subs.gsub(/\b(before|on|in|during|after):(\((.+?)\)\B|(\S+)\b)/) do
diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 6a2a9c4..480f52c 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -37,7 +37,7 @@ class Message
DEFAULT_SENDER = "(missing sender)"
attr_reader :id, :date, :from, :subj, :refs, :replytos, :to, :source,
- :cc, :bcc, :labels, :list_address, :recipient_email, :replyto,
+ :cc, :bcc, :labels, :attachments, :list_address, :recipient_email, :replyto,
:source_info, :list_subscribe, :list_unsubscribe
bool_reader :dirty, :source_marked_read, :snippet_contains_encrypted_content
@@ -54,6 +54,7 @@ class Message
@dirty = false
@encrypted = false
@chunks = nil
+ @attachments = []
## we need to initialize this. see comments in parse_header as to
## why.
@@ -405,6 +406,10 @@ private
## if there's a filename, we'll treat it as an attachment.
if filename
+ # add this to the attachments list if its not a generated html
+ # attachment (should we allow images with generated names?).
+ # Lowercase the filename because searches are easier that way
+ @attachments.push filename.downcase unless filename =~ /^sup-attachment-/
[Chunk::Attachment.new(m.header.content_type, filename, m, sibling_types)]
## otherwise, it's body text
--
1.5.3.7
^ permalink raw reply [flat|nested] 9+ messages in thread
* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
2008-02-25 20:50 [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches Marcus Williams
@ 2008-02-28 17:40 ` William Morgan
2008-02-28 21:15 ` Marcus Williams
0 siblings, 1 reply; 9+ messages in thread
From: William Morgan @ 2008-02-28 17:40 UTC (permalink / raw)
Reformatted excerpts from Marcus Williams's message of 2008-02-25:
> This patch adds the search terms "filename" and "filetype". This
> changes the index so requires a sup-sync --all to work properly, but
> should work on all new messages without it.
Just reading the patch without having applied it yet, this looks pretty
good. To answer the question in the comments, attachments with generated
names are things that are meant to be displayed inline, not
"download-worthy attachments", so I don't think they should be included
in the list of attachments. (Sup only generates a name for them to allow
external viewing programs to view them.) So I think the current
implementation is correct on that point.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
2008-02-28 17:40 ` William Morgan
@ 2008-02-28 21:15 ` Marcus Williams
2008-03-02 18:08 ` William Morgan
0 siblings, 1 reply; 9+ messages in thread
From: Marcus Williams @ 2008-02-28 21:15 UTC (permalink / raw)
On 28.2.2008, William Morgan wrote:
> Just reading the patch without having applied it yet, this looks pretty
> good. To answer the question in the comments, attachments with generated
> names are things that are meant to be displayed inline, not
> "download-worthy attachments", so I don't think they should be included
> in the list of attachments.
Guessed as much (had to sup-sync -all about 10 times on a large imap
account to figure this out though! - although this has made me move to
offlineimap and maildirs which are _much_ faster so I gained something
in the end :)
The only thing I'm a little wary of is the join() I do of the
attachment filenames for the index (like labels). This means that
ferret doesnt actually know the difference between two files called
file1 and file2 and a single file called "file1 file2". Not sure it
matters that much for this usage though.
Also I dont repopulate the attachments attribute on the message object
and I couldnt figure out quite how you do it for labels (through the
initialise?). Might be nice to be able to query the attachments field
as a list on a message object much like labels. This then brought me
back to the problem of how to deal with spaces in filenames. It might
be that I should use some other character for the join thats unlikely
to be in a filename. Not sure what though.
Marcus
^ permalink raw reply [flat|nested] 9+ messages in thread
* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
2008-02-28 21:15 ` Marcus Williams
@ 2008-03-02 18:08 ` William Morgan
2008-03-05 10:01 ` Marcus Williams
0 siblings, 1 reply; 9+ messages in thread
From: William Morgan @ 2008-03-02 18:08 UTC (permalink / raw)
Reformatted excerpts from Marcus Williams's message of 2008-02-28:
> The only thing I'm a little wary of is the join() I do of the
> attachment filenames for the index (like labels). This means that
> ferret doesnt actually know the difference between two files called
> file1 and file2 and a single file called "file1 file2". Not sure it
> matters that much for this usage though.
The answer here is to escape the spaces and to use a Ferret custom
analyzer for this field in the index, one that will split only on
non-escaped spaces.
Something like this (needs testing):
irb(main):055:0> a = Ferret::Analysis::RegExpAnalyzer.new /([^\s\\]|(\\\s))+/, false
=> #<Ferret::Analysis::RegExpAnalyzer:0xb79740fc>
irb(main):056:0> t = a.token_stream :potato, "one\\ two three\\ four"=> #<Ferret::Analysis::TokenStream:0xb79705d8>
irb(main):057:0> t.next
=> token["one\ two":0:8:1]
irb(main):058:0> t.next
=> token["three\ four":9:20:1]
Then assign that analyzer to the :attachments field in index.rb circa
line 37, just like I do for :subject and :body.
You'll have to make sure to do the escaping properly both on user input
at query time, and at storage time to the index.
> Also I dont repopulate the attachments attribute on the message object
> and I couldnt figure out quite how you do it for labels (through the
> initialise?).
Not quite sure what you mean here, but the answer might be: index.rb
line 371 is where we build a Message object from an index entry, and
you'll need to pass in an :attachments attribute (and handle it within
Message#initialize).
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
2008-03-02 18:08 ` William Morgan
@ 2008-03-05 10:01 ` Marcus Williams
2008-03-08 22:02 ` William Morgan
0 siblings, 1 reply; 9+ messages in thread
From: Marcus Williams @ 2008-03-05 10:01 UTC (permalink / raw)
On 2.3.2008, William Morgan wrote:
> The answer here is to escape the spaces and to use a Ferret custom
> analyzer for this field in the index, one that will split only on
> non-escaped spaces.
[snip]
Ah, right. Should be easy enough (sup-sync here we come)
> Not quite sure what you mean here, but the answer might be: index.rb
> line 371 is where we build a Message object from an index entry, and
> you'll need to pass in an :attachments attribute (and handle it within
> Message#initialize).
Ok thats what I figured in the end.
ANother question - how do I get sent/drafts to get the attachment
labels? They dont seem to get set up when I attach a file to a
message. Should I just be adding/deleting them in the methods that
deal with adding/deleting attachments in reply mode?
Thanks
Marcus
^ permalink raw reply [flat|nested] 9+ messages in thread
* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
2008-03-05 10:01 ` Marcus Williams
@ 2008-03-08 22:02 ` William Morgan
2008-03-23 21:13 ` Marcus Williams
0 siblings, 1 reply; 9+ messages in thread
From: William Morgan @ 2008-03-08 22:02 UTC (permalink / raw)
Reformatted excerpts from Marcus Williams's message of 2008-03-05:
> ANother question - how do I get sent/drafts to get the attachment
> labels? They dont seem to get set up when I attach a file to a
> message. Should I just be adding/deleting them in the methods that
> deal with adding/deleting attachments in reply mode?
Once the message is sent, SentManager.write_sent_message will call
PollManager.add_messages_from, which in turn calls the index.rb code you
tweaked to build the message object that's actually used outside of
edit-message-mode.
I've just merged the topic branch that actually make SentManager work
this way down to master (gotta love it when the same changes fix two
different problems!), so if you rebase now, you should be good to go.
Sorry for the general delay in replying. I've been coding up a little
distributed issue tracker that I think will help me manage Sup a little
better... or at least ensure I don't forget people's suggestions.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches
2008-03-08 22:02 ` William Morgan
@ 2008-03-23 21:13 ` Marcus Williams
2008-04-02 20:51 ` William Morgan
0 siblings, 1 reply; 9+ messages in thread
From: Marcus Williams @ 2008-03-23 21:13 UTC (permalink / raw)
On 8.3.2008, William Morgan wrote:
> Sorry for the general delay in replying. I've been coding up a little
> distributed issue tracker that I think will help me manage Sup a little
> better... or at least ensure I don't forget people's suggestions.
Mmmmm - I'm on the lookout for a decent issue tracker :) Any more info?
Marcus
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-02 21:16 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-25 20:50 [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches Marcus Williams
2008-02-28 17:40 ` William Morgan
2008-02-28 21:15 ` Marcus Williams
2008-03-02 18:08 ` William Morgan
2008-03-05 10:01 ` Marcus Williams
2008-03-08 22:02 ` William Morgan
2008-03-23 21:13 ` Marcus Williams
2008-04-02 20:51 ` William Morgan
2008-04-02 21:16 ` vasudeva
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox