From mboxrd@z Thu Jan 1 00:00:00 1970 From: wmorgan-sup@masanjin.net (William Morgan) Date: Sun, 02 Mar 2008 10:08:58 -0800 Subject: [sup-talk] [PATCH] First draft of attachment processing for more gmail style searches In-Reply-To: <1204232994-sup-628@tomsk> References: <1203972458-sup-5906@tomsk> <1204220051-sup-129@south> <1204232994-sup-628@tomsk> Message-ID: <1204479552-sup-4100@south> Reformatted excerpts from Marcus Williams's message of 2008-02-28: > The only thing I'm a little wary of is the join() I do of the > attachment filenames for the index (like labels). This means that > ferret doesnt actually know the difference between two files called > file1 and file2 and a single file called "file1 file2". Not sure it > matters that much for this usage though. The answer here is to escape the spaces and to use a Ferret custom analyzer for this field in the index, one that will split only on non-escaped spaces. Something like this (needs testing): irb(main):055:0> a = Ferret::Analysis::RegExpAnalyzer.new /([^\s\\]|(\\\s))+/, false => # irb(main):056:0> t = a.token_stream :potato, "one\\ two three\\ four"=> # irb(main):057:0> t.next => token["one\ two":0:8:1] irb(main):058:0> t.next => token["three\ four":9:20:1] Then assign that analyzer to the :attachments field in index.rb circa line 37, just like I do for :subject and :body. You'll have to make sure to do the escaping properly both on user input at query time, and at storage time to the index. > Also I dont repopulate the attachments attribute on the message object > and I couldnt figure out quite how you do it for labels (through the > initialise?). Not quite sure what you mean here, but the answer might be: index.rb line 371 is where we build a Message object from an index entry, and you'll need to pass in an :attachments attribute (and handle it within Message#initialize). -- William