Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
From: Kevin Riggle <kevinr@free-dissociation.com>
To: sup-talk <sup-talk@rubyforge.org>
Subject: Re: [sup-talk] Ferret to Xapian conversion
Date: Sun, 03 Jan 2010 18:52:20 -0500	[thread overview]
Message-ID: <1262562493-sup-7933@black-opal.mit.edu> (raw)
In-Reply-To: <1262531773-sup-5192@masanjin.net>

Excerpts from William Morgan's message of Sun Jan 03 10:18:53 -0500 2010:
> If you run this script, please report your experience, since I'd like to
> include it in the 0.10 release coming soon.
> 
When I run the script, I get a number of lines complaining about being unable
to convert various encodings -- is this expected behavior?

eg.:

## read 15375m (about 19%) @ 8.1m/s. 0:31:44 elapsed, about 2:17:28 remaining
[Sun Jan 03 14:53:33 -0500 2010] WARNING: couldn't transcode text from UTF-8 (utf-8) to UTF-8) ("Summer time is popul"...) (got "\223assisting\224 with"... (Iconv::IllegalSequence))
...
[Sun Jan 03 15:31:45 -0500 2010] WARNING: couldn't transcode text from ANSI (ANSI) to UTF-8) ("Dear Librarian, \n\nBr"...) (got invalid encoding ("UTF-8", "ANSI") (Iconv::InvalidEncoding))
## read 33992m (about 41%) @ 8.1m/s. 1:10:13 elapsed, about 1:39:07 remaining
[Sun Jan 03 15:31:55 -0500 2010] WARNING: couldn't transcode text from X-UNKNOWN (ASCII) to UTF-8) ("I agree with Jacky's"...) (got "\240 \240 \240I actually "... (Iconv::IllegalSequence))

Also, the conversion appeared to terminate prematurely, so I didn't bother 
running Sup against the resulting database.  The last few lines were:

## read 34123m (about 42%) @ 8.1m/s. 1:10:28 elapsed, about 1:38:49 remaining
## read 34259m (about 42%) @ 8.1m/s. 1:10:43 elapsed, about 1:38:30 remaining
## read 34388m (about 42%) @ 8.1m/s. 1:10:58 elapsed, about 1:38:12 remaining
kevinr@black-opal:~/src/sup$ 

- Kevin
-- 
Kevin Riggle (kevinr@free-dissociation.com) 
MIT Class of 2010, Course VI-3 (Computer Science)
http://free-dissociation.com
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk


  parent reply	other threads:[~2010-01-04  0:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-02 20:06 Anthony Martinez
2010-01-02 21:24 ` William Morgan
2010-01-02 22:34   ` William Morgan
2010-01-03 15:18     ` William Morgan
2010-01-03 19:38       ` [sup-devel] " Anthony Martinez
2010-01-03 20:18         ` Rich Lane
2010-01-03 22:47           ` Anthony Martinez
2010-01-03 23:52       ` Kevin Riggle [this message]
2010-01-05 21:15         ` William Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1262562493-sup-7933@black-opal.mit.edu \
    --to=kevinr@free-dissociation.com \
    --cc=sup-talk@rubyforge.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox