From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 10.52.188.165 with SMTP id gb5cs23697vdc; Tue, 3 May 2011 07:28:19 -0700 (PDT) Received: by 10.224.203.193 with SMTP id fj1mr7677642qab.226.1304432898299; Tue, 03 May 2011 07:28:18 -0700 (PDT) Return-Path: Received: from rubyforge.org (rubyforge.org [205.234.109.19]) by mx.google.com with ESMTP id m20si248823qck.57.2011.05.03.07.28.18; Tue, 03 May 2011 07:28:18 -0700 (PDT) Received-SPF: pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) client-ip=205.234.109.19; Authentication-Results: mx.google.com; spf=pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) smtp.mail=sup-devel-bounces@rubyforge.org; dkim=neutral (body hash did not verify) header.i=@gmail.com Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id DD01915B802E; Tue, 3 May 2011 10:28:17 -0400 (EDT) Received: from mail-vw0-f50.google.com (mail-vw0-f50.google.com [209.85.212.50]) by rubyforge.org (Postfix) with ESMTP id 6ADCB185838A for ; Tue, 3 May 2011 10:24:07 -0400 (EDT) Received: by vws14 with SMTP id 14so131554vws.23 for ; Tue, 03 May 2011 07:24:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=pEXuA6bpHLBYgJDG36Ru+pLU1bp972HXw1LljsVW1gk=; b=w+oqTFEqOqawvQ4a6XpUXdr/sx9MPEnaBmuisxRTaj/y9E2Cit3x1qhdAFBT4eC3y5 stvfN2ZSMqinPJvtWltbihZGrsY6GkzZH2qTw+sTBnR7CUTMmYetnvV1qyrDElgK/dB8 1o1AjRKiysWYd0X31ttQeuXD2RVOnVveUDwmc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=YvgdZj3e6bFD5fvgMTAXBj3ycwmDLpYsfEIRA1yuLsjRxq5fijPUN+GR54JXl+nzNm eOlfIQRfHeTKgxXGnHZw70RVpmYnSko3jUYXvt9lgeul3rUynrH2zFA0YpUh8yauMj53 wWkX8mB/4oNE6NCORvJVDy7JqrqCXV7GtCI9o= MIME-Version: 1.0 Received: by 10.52.181.98 with SMTP id dv2mr2255830vdc.33.1304432646854; Tue, 03 May 2011 07:24:06 -0700 (PDT) Received: by 10.52.107.2 with HTTP; Tue, 3 May 2011 07:24:06 -0700 (PDT) In-Reply-To: References: <201104251023.19659.hsanson@gmail.com> <1303793294-sup-688@masanjin.net> <1304052708-sup-4240@masanjin.net> Date: Tue, 3 May 2011 23:24:06 +0900 Message-ID: From: Horacio Sanson To: Sup developer discussion Content-Type: multipart/mixed; boundary=bcaec548a78b8c986304a25fe447 Subject: Re: [sup-devel] Cannot query Japanese characters X-BeenThere: sup-devel@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: Sup developer discussion List-Id: Sup developer discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: sup-devel-bounces@rubyforge.org Errors-To: sup-devel-bounces@rubyforge.org --bcaec548a78b8c986304a25fe447 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I managed to stop the crash when searching for Japanese text by forcing UTF-8 encoding in que query parameter (see patch). But seems that Whistelpig cannot speak Japanese. I tried the following small test and as you can see I get no results: > require 'rubygems' =3D> true > require 'whistlepig' =3D> true > include Whistlepig =3D> Object > index =3D Index.new "index" =3D> # > entry1 =3D Entry.new =3D> # > entry1.add_string "body", "=E7=A0=94=E7=A9=B6=E4=BC=9A" =3D> # > docid1 =3D index.add_entry entry1 =3D> 1 > q1 =3D Query.new "body", "=E7=A0=94=E7=A9=B6" =3D> body:"=E7=A0=94=E7=A9= =B6" > results1 =3D index.search q1 =3D> [] I will now dig in Whistelpig source code to see if I can fix this but any pointer/directions or tips were to start looking would be greatly appreciated. On Mon, May 2, 2011 at 12:46 AM, Horacio Sanson wrote: > I also tried with ruby 1.8 and heliotrope does not crash but searching > any Japanese word returns no matches even for search terms I now have > matches. > > And by the way the installation instructions should mention that for > ruby 1.8 we also need to install the json gem or heliotrope won't > start. > > regards, > Horacio > > On Mon, May 2, 2011 at 12:35 AM, Horacio Sanson wrote= : >> Installed whistelpig 0.6 but now I get a different error that looks >> similar to the turnsole problem. Below the backtrace: >> >> http://localhost:8042/search?q=3Dprimo -> /search?q=3D%7Einbox&start=3D0= &num=3D20 >> 127.0.0.1 - - [02/May/2011 00:31:58] "GET /favicon.ico HTTP/1.1" 404 447= 0.0008 >> localhost - - [02/May/2011:00:31:58 JST] "GET /favicon.ico HTTP/1.1" 404= 447 >> - -> /favicon.ico >> search(body:"=E4=BC=9A", 0, 20) took 0.0ms >> Encoding::CompatibilityError - incompatible character encodings: UTF-8 >> and ASCII-8BIT: >> =C2=A0bin/heliotrope-server:154:in `block in ' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in= `call' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in >> `block in compile!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in >> `instance_eval' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in = `route_eval' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in >> `block (2 levels) in route!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in >> `block in process_route' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in = `catch' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in >> `process_route' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in >> `block in route!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in = `each' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in = `route!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in = `dispatch!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in >> `block in call!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in >> `instance_eval' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in >> `block in invoke' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in = `catch' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in = `invoke' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in = `call!' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in = `call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call' >> =C2=A0/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.= rb:21:in >> `call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:= in `call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in= `call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:= in `call' >> =C2=A0/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52= :in `service' >> =C2=A0/usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service' >> =C2=A0/usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run' >> =C2=A0/usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_threa= d' >> 127.0.0.1 - - [02/May/2011 00:32:09] "GET /search?q=3D%E4%BC%9A >> HTTP/1.1" 500 89861 0.0228 >> localhost - - [02/May/2011:00:32:09 JST] "GET /search?q=3D%E4%BC%9A >> HTTP/1.1" 500 89861 >> http://localhost:8042/search?q=3D%7Einbox&start=3D0&num=3D20 -> /search?= q=3D%E4%BC%9A >> 127.0.0.1 - - [02/May/2011 00:32:09] "GET /favicon.ico HTTP/1.1" 404 447= 0.0009 >> localhost - - [02/May/2011:00:32:09 JST] "GET /favicon.ico HTTP/1.1" 404= 447 >> - -> /favicon.ico >> >> regards, >> Horacio >> >> On Fri, Apr 29, 2011 at 1:52 PM, William Morgan >> wrote: >>> Reformatted excerpts from William Morgan's message of 2011-04-26: >>>> Thanks for the bug report on this one too. It's great to have someone >>>> testing this stuff with non-ASCII code. This is a known bug in >>>> Whistlepig and I should be releasing a fix soon. >>> >>> This is fixed in Whistlepig 0.6. Heliotrope should now be fine with >>> utf-8 input. I'm still working on this issue in turnsole. >>> >>> Let me know if you have any more issues! >>> -- >>> William >>> _______________________________________________ >>> Sup-devel mailing list >>> Sup-devel@rubyforge.org >>> http://rubyforge.org/mailman/listinfo/sup-devel >>> >> > --bcaec548a78b8c986304a25fe447 Content-Type: text/x-patch; charset=US-ASCII; name="0001-Fix-crash-for-non-ASCII-chars.patch" Content-Disposition: attachment; filename="0001-Fix-crash-for-non-ASCII-chars.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gn8xe1u40 RnJvbSAwODgxNjMwYzhiNDEwYjZmNzhkZjU3OGJmNjg2YWZhY2JiNzhlYzY0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBIb3JhY2lvIFNhbnNvbiA8aHNhbnNvbkBnbWFpbC5jb20+CkRh dGU6IFR1ZSwgMyBNYXkgMjAxMSAyMzoxODoyMiArMDkwMApTdWJqZWN0OiBbUEFUQ0hdIEZpeCBj cmFzaCBmb3Igbm9uIEFTQ0lJIGNoYXJzLgoKLS0tCiBiaW4vaGVsaW90cm9wZS1zZXJ2ZXIgfCAg ICAyICstCiAxIGZpbGVzIGNoYW5nZWQsIDEgaW5zZXJ0aW9ucygrKSwgMSBkZWxldGlvbnMoLSkK CmRpZmYgLS1naXQgYS9iaW4vaGVsaW90cm9wZS1zZXJ2ZXIgYi9iaW4vaGVsaW90cm9wZS1zZXJ2 ZXIKaW5kZXggNDc5M2FjMi4uZWQ5YzNiZSAxMDA2NDQKLS0tIGEvYmluL2hlbGlvdHJvcGUtc2Vy dmVyCisrKyBiL2Jpbi9oZWxpb3Ryb3BlLXNlcnZlcgpAQCAtMTUxLDcgKzE1MSw3IEBAIGNsYXNz IEhlbGlvdHJvcGVTZXJ2ZXIgPCBTaW5hdHJhOjpCYXNlCiAgICAgICBuYXYgKz0gIjwvZGl2PiIK IAogICAgICAgaGVhZGVyKCJTZWFyY2g6ICN7cXVlcnkub3JpZ2luYWxfcXVlcnlfc30iLCBxdWVy eS5vcmlnaW5hbF9xdWVyeV9zKSArCi0gICAgICAgICI8ZGl2PlBhcnNlZCBxdWVyeTogI3tlc2Nh cGVfaHRtbCBxdWVyeS5wYXJzZWRfcXVlcnlfc308L2Rpdj4iICsKKyAgICAgICAgIjxkaXY+UGFy c2VkIHF1ZXJ5OiAje2VzY2FwZV9odG1sIHF1ZXJ5LnBhcnNlZF9xdWVyeV9zLmZvcmNlX2VuY29k aW5nKCdVVEYtOCcpfTwvZGl2PiIgKwogICAgICAgICAiPGRpdj5TZWFyY2ggdG9vayAje3Nwcmlu dGYgJyUuMmYnLCBpbmZvWzplbGFwc2VkXX1zIGFuZCAje2luZm9bOmNvbnRpbnVlZF0gPyAnd2Fz JyA6ICd3YXMgTk9UJ30gY29udGludWVkPC9kaXY+IiArCiAgICAgICAgICIje25hdn08dGFibGU+ IiArCiAgICAgICAgIHJlc3VsdHMubWFwIHsgfHJ8IHRocmVhZGluZm9fdG9faHRtbCByIH0uam9p biArCi0tIAoxLjcuNC4xCgo= --bcaec548a78b8c986304a25fe447 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel --bcaec548a78b8c986304a25fe447--