From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 10.213.13.12 with SMTP id z12cs1174118ebz; Sun, 3 Jan 2010 16:38:48 -0800 (PST) Received: by 10.224.96.219 with SMTP id i27mr10861418qan.134.1262565528109; Sun, 03 Jan 2010 16:38:48 -0800 (PST) Return-Path: Received: from rubyforge.org (rubyforge.org [205.234.109.19]) by mx.google.com with ESMTP id 6si25669106qyk.71.2010.01.03.16.38.47; Sun, 03 Jan 2010 16:38:48 -0800 (PST) Received-SPF: pass (google.com: domain of sup-talk-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) client-ip=205.234.109.19; Authentication-Results: mx.google.com; spf=pass (google.com: domain of sup-talk-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) smtp.mail=sup-talk-bounces@rubyforge.org Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 7E7F61D78883; Sun, 3 Jan 2010 19:38:47 -0500 (EST) X-Greylist: delayed 2622 seconds by postgrey-1.31 at rubyforge.org; Sun, 03 Jan 2010 19:36:06 EST Received: from granite.free-dissociation.com (207-180-139-38.ma.subnet.cable.rcn.com [207.180.139.38]) by rubyforge.org (Postfix) with ESMTP id 9466718580F5 for ; Sun, 3 Jan 2010 19:36:06 -0500 (EST) Received: from 207-180-139-39.ma.subnet.cable.rcn.com ([207.180.139.39] helo=localhost) by granite.free-dissociation.com with esmtpsa (TLS-1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.63) (envelope-from ) id 1NRaDo-0000Ne-Pu; Sun, 03 Jan 2010 18:50:56 -0500 From: Kevin Riggle To: sup-talk In-reply-to: <1262531773-sup-5192@masanjin.net> References: <1262460996-sup-1383@home.mrtheplague.net> <1262467343-sup-9565@masanjin.net> <1262471675-sup-1708@masanjin.net> <1262531773-sup-5192@masanjin.net> Date: Sun, 03 Jan 2010 18:52:20 -0500 Message-Id: <1262562493-sup-7933@black-opal.mit.edu> User-Agent: Sup/git Subject: Re: [sup-talk] Ferret to Xapian conversion X-BeenThere: sup-talk@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: User & developer discussion of Sup List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: sup-talk-bounces@rubyforge.org Errors-To: sup-talk-bounces@rubyforge.org Excerpts from William Morgan's message of Sun Jan 03 10:18:53 -0500 2010: > If you run this script, please report your experience, since I'd like to > include it in the 0.10 release coming soon. > When I run the script, I get a number of lines complaining about being unable to convert various encodings -- is this expected behavior? eg.: ## read 15375m (about 19%) @ 8.1m/s. 0:31:44 elapsed, about 2:17:28 remaining [Sun Jan 03 14:53:33 -0500 2010] WARNING: couldn't transcode text from UTF-8 (utf-8) to UTF-8) ("Summer time is popul"...) (got "\223assisting\224 with"... (Iconv::IllegalSequence)) ... [Sun Jan 03 15:31:45 -0500 2010] WARNING: couldn't transcode text from ANSI (ANSI) to UTF-8) ("Dear Librarian, \n\nBr"...) (got invalid encoding ("UTF-8", "ANSI") (Iconv::InvalidEncoding)) ## read 33992m (about 41%) @ 8.1m/s. 1:10:13 elapsed, about 1:39:07 remaining [Sun Jan 03 15:31:55 -0500 2010] WARNING: couldn't transcode text from X-UNKNOWN (ASCII) to UTF-8) ("I agree with Jacky's"...) (got "\240 \240 \240I actually "... (Iconv::IllegalSequence)) Also, the conversion appeared to terminate prematurely, so I didn't bother running Sup against the resulting database. The last few lines were: ## read 34123m (about 42%) @ 8.1m/s. 1:10:28 elapsed, about 1:38:49 remaining ## read 34259m (about 42%) @ 8.1m/s. 1:10:43 elapsed, about 1:38:30 remaining ## read 34388m (about 42%) @ 8.1m/s. 1:10:58 elapsed, about 1:38:12 remaining kevinr@black-opal:~/src/sup$ - Kevin -- Kevin Riggle (kevinr@free-dissociation.com) MIT Class of 2010, Course VI-3 (Computer Science) http://free-dissociation.com _______________________________________________ sup-talk mailing list sup-talk@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-talk