* [sup-talk] the proper way of determining the encoding @ 2008-01-05 21:46 Giorgio Lando 2008-01-07 6:04 ` William Morgan 0 siblings, 1 reply; 20+ messages in thread From: Giorgio Lando @ 2008-01-05 21:46 UTC (permalink / raw) I had initially some problems with encoding in sup (accented chars were not displayed). So I have looked in lib/sup.rb and I have seen that sup tries to determine the $encoding from the $ctype, determined on its turn by LC_CTYPE and LANG. This failed in my case (I do not know why: my $LANG is it_IT and my $LC_ALL - implying $LC_CTYPE - is it_IT at euro). Anyway I guess that sup could/should use the environment variable $CHARSET, when it is defined, and resort to other methods only if $CHARSET is not defined. Or at least I have been able to fix my issue with encodings by changing the 55th line of lib/sup.rb in the following way: $encoding = ENV["CHARSET"] Cheers Giorgio ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-05 21:46 [sup-talk] the proper way of determining the encoding Giorgio Lando @ 2008-01-07 6:04 ` William Morgan 2008-01-07 8:44 ` Giorgio Lando 2008-01-07 8:51 ` Giorgio Lando 0 siblings, 2 replies; 20+ messages in thread From: William Morgan @ 2008-01-07 6:04 UTC (permalink / raw) Excerpts from Giorgio Lando's message of Sat Jan 05 13:46:43 -0800 2008: > I had initially some problems with encoding in sup (accented chars > were not displayed). So I have looked in lib/sup.rb and I have seen > that sup tries to determine the $encoding from the $ctype, determined > on its turn by LC_CTYPE and LANG. This failed in my case (I do not > know why: my $LANG is it_IT and my $LC_ALL - implying $LC_CTYPE - is > it_IT at euro). The current way I choose the encoding is a complete hack, and clearly doesn't work for your case. (It looks for a .<something> at the end of LANG or LC_CTYPE). But I'm not really sure what the correct way is. What does this command produce on your system? "locale -c LC_CTYPE | head -6". > $encoding = ENV["CHARSET"] I don't really want to use a non-standard environment variable if at all possible... -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-07 6:04 ` William Morgan @ 2008-01-07 8:44 ` Giorgio Lando 2008-01-07 8:51 ` Giorgio Lando 1 sibling, 0 replies; 20+ messages in thread From: Giorgio Lando @ 2008-01-07 8:44 UTC (permalink / raw) > What does this command produce on your system? "locale -c LC_CTYPE | > head -6". It produces: LC_CTYPE upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3 toupper;tolower;totitle 16 1 ISO-8859-15 And actually ISO-8859-15 is the right/desired encoding. Giorgio ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-07 6:04 ` William Morgan 2008-01-07 8:44 ` Giorgio Lando @ 2008-01-07 8:51 ` Giorgio Lando 2008-01-09 18:00 ` William Morgan 1 sibling, 1 reply; 20+ messages in thread From: Giorgio Lando @ 2008-01-07 8:51 UTC (permalink / raw) > > $encoding = ENV["CHARSET"] > > I don't really want to use a non-standard environment variable if at all > possible... I understand. May be the encoding problems are so intricate and heterogeneous that they could be worth a configuration option, so the user can in anyway force a certain encoding if needed/desired? Giorgio ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-07 8:51 ` Giorgio Lando @ 2008-01-09 18:00 ` William Morgan 2008-01-10 0:39 ` Giorgio Lando 0 siblings, 1 reply; 20+ messages in thread From: William Morgan @ 2008-01-09 18:00 UTC (permalink / raw) Excerpts from Giorgio Lando's message of Mon Jan 07 00:51:08 -0800 2008: > I understand. May be the encoding problems are so intricate and > heterogeneous that they could be worth a configuration option, so the > user can in anyway force a certain encoding if needed/desired? I'm certainly happy to allow users to force a certain encoding, but I think largely issue of encoding problems HAS been solved (at least on Unixes) by locale and all the LC_* environment variables. I'm certainly not an export on this stuff, though. I just committed the following terrible patch to next, which should properly find everyone's locale, if they have a locale. Mac users, I'd love to know if this actually works for you. commit 6af3048fe82f48f0368a619ea785f0a394b0bbd4 Author: William Morgan <wmorgan-sup at masanjin.net> Date: Wed Jan 9 08:30:30 2008 -0800 detect character set correctly (but unix-centrically) diff --git a/lib/sup.rb b/lib/sup.rb index 25809dd..5bb27ba 100644 --- a/lib/sup.rb +++ b/lib/sup.rb @@ -49,16 +49,6 @@ module Redwood YAML_DOMAIN = "masanjin.net" YAML_DATE = "2006-10-01" -## determine encoding and character set -## probably a better way to do this - $ctype = ENV["LC_CTYPE"] || ENV["LANG"] || "en-US.utf-8" - $encoding = - if $ctype =~ /\.(.*)?/ - $1 - else - "utf-8" - end - ## record exceptions thrown in threads nicely def reporting_thread name if $opts[:no_threads] @@ -235,6 +225,16 @@ module Redwood module_function :log end +## determine encoding and character set. there MUST be a better way to +## do this. + $encoding = `locale -c LC_CTYPE|head -6|tail -1`.chomp + if $encoding + Redwood::log "using character set encoding #{$encoding.inspect}" + else + Redwood::log "warning: can't find character set by using locale, defaulting + $encoding = "utf-8" + end + ## now everything else (which can feel free to call Redwood::log at load time) require "sup/update" require "sup/suicide" -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-09 18:00 ` William Morgan @ 2008-01-10 0:39 ` Giorgio Lando 2008-01-13 2:22 ` William Morgan 0 siblings, 1 reply; 20+ messages in thread From: Giorgio Lando @ 2008-01-10 0:39 UTC (permalink / raw) > I just committed the following terrible patch to next, which should > properly find everyone's locale, if they have a locale. Mac users, I'd > love to know if this actually works for you. I am not a mac user, but it works fine for me! Giorgio ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-10 0:39 ` Giorgio Lando @ 2008-01-13 2:22 ` William Morgan 2008-01-13 13:09 ` Giorgio Lando 2008-01-15 16:30 ` Grant Hollingworth 0 siblings, 2 replies; 20+ messages in thread From: William Morgan @ 2008-01-13 2:22 UTC (permalink / raw) Excerpts from Giorgio Lando's message of Wed Jan 09 16:39:10 -0800 2008: > I am not a mac user, but it works fine for me! I think I've found a better way, although it introduces yet another dependency, to the 'gettext' gem. I'd be interested to see if this works for you, Giorgio, and also for anyone who's running Sup under Cygwin or OS X. This is an approach I feel a lot better about. diff --git a/Rakefile b/Rakefile index 2f2b992..d4060c1 100644 --- a/Rakefile +++ b/Rakefile @@ -17,7 +17,7 @@ Hoe.new('sup', Redwood::VERSION) do |p| p.url = p.paragraphs_of('README.txt', 0).first.split(/\n/)[2].gsub(/^\s+/, "") p.changes = p.paragraphs_of('History.txt', 0..0).join("\n\n") p.email = "wmorgan-sup at masanjin.net" - p.extra_deps = [['ferret', '>= 0.10.13'], ['ncurses', '>= 0.9.1'], ['rmail', '>= 0.17'], 'highline', 'net-ssh', ['trollop', '>= 1.7'], 'lockfile', 'mime-types'] + p.extra_deps = [['ferret', '>= 0.10.13'], ['ncurses', '>= 0.9.1'], ['rmail', '>= 0.17'], 'highline', 'net-ssh', ['trollop', '>= 1.7'], 'lockfile', 'mime-types', 'gettext'] end rule 'ss?.png' => 'ss?-small.png' do |t| diff --git a/lib/sup.rb b/lib/sup.rb index 5bb27ba..064e0af 100644 --- a/lib/sup.rb +++ b/lib/sup.rb @@ -3,6 +3,7 @@ require 'yaml' require 'zlib' require 'thread' require 'fileutils' +require 'gettext' require 'curses' class Object @@ -225,9 +226,8 @@ module Redwood module_function :log end -## determine encoding and character set. there MUST be a better way to -## do this. - $encoding = `locale -c LC_CTYPE|head -6|tail -1`.chomp +## determine encoding and character set + $encoding = Locale.current.charset if $encoding Redwood::log "using character set encoding #{$encoding.inspect}" else -- 1.5.4.rc2.60.gb2e62-dirty -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-13 2:22 ` William Morgan @ 2008-01-13 13:09 ` Giorgio Lando 2008-01-15 16:30 ` Grant Hollingworth 1 sibling, 0 replies; 20+ messages in thread From: Giorgio Lando @ 2008-01-13 13:09 UTC (permalink / raw) Excerpts from William Morgan's message of Sun Jan 13 03:22:58 +0100 2008: > > I think I've found a better way, although it introduces yet another > dependency, to the 'gettext' gem. I'd be interested to see if this works > for you, Giorgio, and also for anyone who's running Sup under Cygwin or > OS X. This is an approach I feel a lot better about. Yes, it works for me! Giorgio ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-13 2:22 ` William Morgan 2008-01-13 13:09 ` Giorgio Lando @ 2008-01-15 16:30 ` Grant Hollingworth 2008-01-16 1:45 ` William Morgan 1 sibling, 1 reply; 20+ messages in thread From: Grant Hollingworth @ 2008-01-15 16:30 UTC (permalink / raw) Excerpts from William Morgan's message of Sat Jan 12 21:22:58 -0500 2008: > I think I've found a better way, although it introduces yet another > dependency, to the 'gettext' gem. I'd be interested to see if this works > for you, Giorgio, and also for anyone who's running Sup under Cygwin or > OS X. This is an approach I feel a lot better about. This works on OS X (10.5.1). Your other method of using locale(1) didn't. ('locale -c LC_CTYPE' returns 'LC_CTYPE' on my machine... not very useful.) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-15 16:30 ` Grant Hollingworth @ 2008-01-16 1:45 ` William Morgan 2008-01-16 2:20 ` Nicolas Pouillard 0 siblings, 1 reply; 20+ messages in thread From: William Morgan @ 2008-01-16 1:45 UTC (permalink / raw) Reformatted excerpts from Grant Hollingworth's message of 2008-01-15: > Excerpts from William Morgan's message of Sat Jan 12 21:22:58 -0500 2008: > > I think I've found a better way, although it introduces yet another > > dependency, to the 'gettext' gem. I'd be interested to see if this > > works for you, Giorgio, and also for anyone who's running Sup under > > Cygwin or OS X. This is an approach I feel a lot better about. > > This works on OS X (10.5.1). Your other method of using locale(1) > didn't. ('locale -c LC_CTYPE' returns 'LC_CTYPE' on my machine... not > very useful.) Exactly what I was hoping to hear. I'll merge those changes down to master, then. BTW, your past few messages have been using charset=LC_CTYPE, which the mailing list ever-so-helpfully wraps in a nasty MIME attachment. So you should switch back to the functioning version. :) -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 1:45 ` William Morgan @ 2008-01-16 2:20 ` Nicolas Pouillard 2008-01-16 2:57 ` William Morgan 0 siblings, 1 reply; 20+ messages in thread From: Nicolas Pouillard @ 2008-01-16 2:20 UTC (permalink / raw) Excerpts from William Morgan's message of Wed Jan 16 02:45:14 +0100 2008: > Reformatted excerpts from Grant Hollingworth's message of 2008-01-15: > > Excerpts from William Morgan's message of Sat Jan 12 21:22:58 -0500 2008: > > > I think I've found a better way, although it introduces yet another > > > dependency, to the 'gettext' gem. I'd be interested to see if this > > > works for you, Giorgio, and also for anyone who's running Sup under > > > Cygwin or OS X. This is an approach I feel a lot better about. > > > > This works on OS X (10.5.1). Your other method of using locale(1) > > didn't. ('locale -c LC_CTYPE' returns 'LC_CTYPE' on my machine... not > > very useful.) > > Exactly what I was hoping to hear. I'll merge those changes down to > master, then. > > BTW, your past few messages have been using charset=LC_CTYPE, which the > mailing list ever-so-helpfully wraps in a nasty MIME attachment. So you > should switch back to the functioning version. :) Yuck! I got this virus too :) And I don't see your gettext patch neither on master nor on next ;( -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 2:20 ` Nicolas Pouillard @ 2008-01-16 2:57 ` William Morgan 2008-01-16 3:09 ` William Morgan 2008-01-16 3:14 ` Nicolas Pouillard 0 siblings, 2 replies; 20+ messages in thread From: William Morgan @ 2008-01-16 2:57 UTC (permalink / raw) Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > And I don't see your gettext patch neither on master nor on next ;( Whoops, I had never merged it onto next in the first place. Well, it's on master now. Let me know if it gives you any trouble. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 2:57 ` William Morgan @ 2008-01-16 3:09 ` William Morgan 2008-01-16 8:51 ` Nicolas Pouillard 2008-01-16 3:14 ` Nicolas Pouillard 1 sibling, 1 reply; 20+ messages in thread From: William Morgan @ 2008-01-16 3:09 UTC (permalink / raw) Reformatted excerpts from William Morgan's message of 2008-01-15: > Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > > And I don't see your gettext patch neither on master nor on next ;( > > Whoops, I had never merged it onto next in the first place. Well, it's > on master now. Let me know if it gives you any trouble. Aaaand merged onto next. I do seem to spend a lot of my time merging from master to next. I wonder if that's proper git usage. Seems to clutter up my pretty gitk graph. -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 3:09 ` William Morgan @ 2008-01-16 8:51 ` Nicolas Pouillard 0 siblings, 0 replies; 20+ messages in thread From: Nicolas Pouillard @ 2008-01-16 8:51 UTC (permalink / raw) Excerpts from William Morgan's message of Wed Jan 16 04:09:48 +0100 2008: > Reformatted excerpts from William Morgan's message of 2008-01-15: > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > > > And I don't see your gettext patch neither on master nor on next ;( > > > > Whoops, I had never merged it onto next in the first place. Well, it's > > on master now. Let me know if it gives you any trouble. > > Aaaand merged onto next. I do seem to spend a lot of my time merging > from master to next. I wonder if that's proper git usage. Seems to > clutter up my pretty gitk graph. I don't know the proper git usage, but I think that you should make all you commits on next, apply all patches from users on next, and then cherry-pick changes from next to master when they are well tested. -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 2:57 ` William Morgan 2008-01-16 3:09 ` William Morgan @ 2008-01-16 3:14 ` Nicolas Pouillard 2008-01-16 3:23 ` William Morgan 1 sibling, 1 reply; 20+ messages in thread From: Nicolas Pouillard @ 2008-01-16 3:14 UTC (permalink / raw) Excerpts from William Morgan's message of Wed Jan 16 03:57:00 +0100 2008: > Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > > And I don't see your gettext patch neither on master nor on next ;( > > Whoops, I had never merged it onto next in the first place. Well, it's > on master now. Let me know if it gives you any trouble. Still not working, gettext is guessing ASCII-US but my LC_CTYPE contains en_US.UTF-8. What about `locale | grep LC_CTYPE | cut -d'"' -f2`? -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 3:14 ` Nicolas Pouillard @ 2008-01-16 3:23 ` William Morgan 2008-01-16 3:28 ` Nicolas Pouillard 0 siblings, 1 reply; 20+ messages in thread From: William Morgan @ 2008-01-16 3:23 UTC (permalink / raw) Sorry, meant to reply to list. Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > Still not working, gettext is guessing ASCII-US but my LC_CTYPE > contains en_US.UTF-8. Well crap. Do you have a $LC_CTYPE environment variable defined, that it's not picking up on? -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 3:23 ` William Morgan @ 2008-01-16 3:28 ` Nicolas Pouillard 2008-01-16 3:38 ` William Morgan 0 siblings, 1 reply; 20+ messages in thread From: Nicolas Pouillard @ 2008-01-16 3:28 UTC (permalink / raw) Excerpts from William Morgan's message of Wed Jan 16 04:23:35 +0100 2008: > Sorry, meant to reply to list. > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > > Still not working, gettext is guessing ASCII-US but my LC_CTYPE > > contains en_US.UTF-8. > > Well crap. Do you have a $LC_CTYPE environment variable defined, that > it's not picking up on? Yes $LC_CTYPE is set to en_US.UTF-8. I've looked at the gettext code and found something: In locale_posix.rb: ... [ENV["LC_ALL"], ENV["LC_MESSAGES"], ENV["LANG"], ... In locale_win32.rb: ... ["LC_ALL", "LC_CTYPE", "LC_MESSAGES", "LANG"].each do |env| ... Adding LC_CTYPE to the first list solve my problem... -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 3:28 ` Nicolas Pouillard @ 2008-01-16 3:38 ` William Morgan 2008-01-16 8:46 ` Nicolas Pouillard 0 siblings, 1 reply; 20+ messages in thread From: William Morgan @ 2008-01-16 3:38 UTC (permalink / raw) Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > In locale_posix.rb: > ... > [ENV["LC_ALL"], ENV["LC_MESSAGES"], ENV["LANG"], > ... > > In locale_win32.rb: > ... > ["LC_ALL", "LC_CTYPE", "LC_MESSAGES", "LANG"].each do |env| > ... > > Adding LC_CTYPE to the first list solve my problem... Excellent, sounds like a bug in gettext. If you submit a bug report, would you mind cc'ing me? -- William <wmorgan-sup at masanjin.net> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 3:38 ` William Morgan @ 2008-01-16 8:46 ` Nicolas Pouillard 2008-01-18 16:40 ` Nicolas Pouillard 0 siblings, 1 reply; 20+ messages in thread From: Nicolas Pouillard @ 2008-01-16 8:46 UTC (permalink / raw) Excerpts from William Morgan's message of Wed Jan 16 04:38:14 +0100 2008: > Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > > In locale_posix.rb: > > ... > > [ENV["LC_ALL"], ENV["LC_MESSAGES"], ENV["LANG"], > > ... > > > > In locale_win32.rb: > > ... > > ["LC_ALL", "LC_CTYPE", "LC_MESSAGES", "LANG"].each do |env| > > ... > > > > Adding LC_CTYPE to the first list solve my problem... > > Excellent, sounds like a bug in gettext. If you submit a bug report, > would you mind cc'ing me? http://rubyforge.org/tracker/index.php?func=detail&aid=17133&group_id=855&atid=3377 -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 20+ messages in thread
* [sup-talk] the proper way of determining the encoding 2008-01-16 8:46 ` Nicolas Pouillard @ 2008-01-18 16:40 ` Nicolas Pouillard 0 siblings, 0 replies; 20+ messages in thread From: Nicolas Pouillard @ 2008-01-18 16:40 UTC (permalink / raw) Excerpts from Nicolas Pouillard's message of Wed Jan 16 09:46:08 +0100 2008: > Excerpts from William Morgan's message of Wed Jan 16 04:38:14 +0100 2008: > > Reformatted excerpts from nicolas.pouillard's message of 2008-01-15: > > > In locale_posix.rb: > > > ... > > > [ENV["LC_ALL"], ENV["LC_MESSAGES"], ENV["LANG"], > > > ... > > > > > > In locale_win32.rb: > > > ... > > > ["LC_ALL", "LC_CTYPE", "LC_MESSAGES", "LANG"].each do |env| > > > ... > > > > > > Adding LC_CTYPE to the first list solve my problem... > > > > Excellent, sounds like a bug in gettext. If you submit a bug report, > > would you mind cc'ing me? > > http://rubyforge.org/tracker/index.php?func=detail&aid=17133&group_id=855&atid=3377 > It's end by a "wont fix": """ Date: 2008-01-17 12:29 Sender: Masao Mutoh LC_CTYPE is not for messaging. So locale_win32.rb was wrong. And it removed in current CVS version. """ I don't know much about locale and encoding but I thought LC_CTYPE was a good way of setting it. -- Nicolas Pouillard aka Ertai ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-01-18 16:40 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-05 21:46 [sup-talk] the proper way of determining the encoding Giorgio Lando 2008-01-07 6:04 ` William Morgan 2008-01-07 8:44 ` Giorgio Lando 2008-01-07 8:51 ` Giorgio Lando 2008-01-09 18:00 ` William Morgan 2008-01-10 0:39 ` Giorgio Lando 2008-01-13 2:22 ` William Morgan 2008-01-13 13:09 ` Giorgio Lando 2008-01-15 16:30 ` Grant Hollingworth 2008-01-16 1:45 ` William Morgan 2008-01-16 2:20 ` Nicolas Pouillard 2008-01-16 2:57 ` William Morgan 2008-01-16 3:09 ` William Morgan 2008-01-16 8:51 ` Nicolas Pouillard 2008-01-16 3:14 ` Nicolas Pouillard 2008-01-16 3:23 ` William Morgan 2008-01-16 3:28 ` Nicolas Pouillard 2008-01-16 3:38 ` William Morgan 2008-01-16 8:46 ` Nicolas Pouillard 2008-01-18 16:40 ` Nicolas Pouillard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox