* [sup-talk] System encoding versus messages encoding
@ 2008-04-21 20:37 Israel Herraiz
2008-04-22 11:39 ` Shot (Piotr Szotkowski)
2008-04-22 23:18 ` William Morgan
0 siblings, 2 replies; 11+ messages in thread
From: Israel Herraiz @ 2008-04-21 20:37 UTC (permalink / raw)
Hi there,
I have been using Sup for a couple of days, and I have found some
problems with the encoding of messages written in ISO-8859-1(5).
The encoding in my system is en_US.UTF-8, but most of the email that I
receive is in Spanish, and usually encoded with ISO-8859-1 (and
sometimes with ISO-8859-15).
When I start Sup, it detects my UTF-8 and tries to decode the messages
with that encoding, that results in hardly readable messages (most of
the sentences where wide characters appear are truncated).
If I start it with "LANG=en_US.ISO-8859-1 sup" it now detects the new
encoding, and the messages are more readable, but I can not see the
right characters because my terminal enconding is UTF-8 (but at least
the sentences are not truncated).
With Mutt (and other email clients), I still use UTF-8 as my system
encoding, and I see ISO-8859-1 messages correctly. The encoding of the
messages is of course included in the headers.
As far as I know (considering what I have read in the documentation
and in the archives of this mailing list), Sup determines the encoding
using the environment variables and tries to decode all the messages
using that encoding. For people working in different languages and
environments (like me, I write in English and Spanish, some people
send me messages in UTF-8, some other in ISO-88159-1), having an
overall encoding for all the messages is not a good solution.
Would it possible to decode each message according to its headers?
Please correct me if I am wrong in any of my assumptions on how Sup
encodes/decodes messages.
By the way, I am using the Git version of Sup (as of today :-).
Cheers,
Israel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-21 20:37 [sup-talk] System encoding versus messages encoding Israel Herraiz
@ 2008-04-22 11:39 ` Shot (Piotr Szotkowski)
2008-04-22 23:18 ` William Morgan
1 sibling, 0 replies; 11+ messages in thread
From: Shot (Piotr Szotkowski) @ 2008-04-22 11:39 UTC (permalink / raw)
Israel Herraiz:
> As far as I know (considering what I have read in the documentation
> and in the archives of this mailing list), Sup determines the encoding
> using the environment variables and tries to decode all the messages
> using that encoding. For people working in different languages and
> environments (like me, I write in English and Spanish, some people
> send me messages in UTF-8, some other in ISO-88159-1), having an
> overall encoding for all the messages is not a good solution.
I agree wholeheartedly ? I tried to use Sup a couple of weeks ago, but
this issue made it unusable for me (I planned to hack on this some day,
but my work and uni obligations do not leave any free time lately). :(
To make matters worse, the relevant RFC actually requires emails to
be encoded in the ?tightest? encoding possible ? when I?m writing an
email in Polish without any non-US-ASCII letters, it should be sent
as US-ASCII; if I include Polish diacritical characters, it should be
encoded as ISO-8859-2, but if I add some characters outside of it (say,
ellipsis) it should be sent in UTF-8.
As a result, I regularly receilve emails in US-ASCII, ISO-8859-2 and
UTF-8 ? but also in ISO-8859-1, ISO-8859-15, as well as some cyryllic
encodings. I remember Mutt going through some growing pains to
accomodate this, but has it all sorted out now (there is an issue
of recoding the email one replies to to the encoding expected by the
editor, for example).
I?ll be more than happy to test any work done in this regard and I offer
any knowledge that could be useful; unfortunately, I can?t promise any
hacking time (I just got accepted for this year?s Summer of Code to hack
on CiviCRM internationalisation ? maybe Sup could apply to be a project
in next year?s SoC edition?).
-- Shot
--
I grew up in Europe, where the history comes from. -- Eddie Izzard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080422/9e7cc1bf/attachment.bin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-21 20:37 [sup-talk] System encoding versus messages encoding Israel Herraiz
2008-04-22 11:39 ` Shot (Piotr Szotkowski)
@ 2008-04-22 23:18 ` William Morgan
2008-04-23 0:09 ` Israel Herraiz
` (2 more replies)
1 sibling, 3 replies; 11+ messages in thread
From: William Morgan @ 2008-04-22 23:18 UTC (permalink / raw)
Reformatted excerpts from Israel Herraiz's message of 2008-04-21:
> As far as I know (considering what I have read in the documentation
> and in the archives of this mailing list), Sup determines the encoding
> using the environment variables and tries to decode all the messages
> using that encoding.
This is not correct. Sup DOES determin the message's (or MIME
component's) charset, and transcodes it to your terminal charset using
the iconv library before display. (After decoding quoted-printable,
etc.)
The problem is actually that the Ruby ncurses gem is not wide-character
aware, so dumping UTF8 to the terminal doesn't actually work. Well, it
actually does seem to work for some characters, but not most of them.
The solution has been known for quite some time, but it ain't pretty:
http://rubyforge.org/pipermail/sup-talk/2007-October/000297.html
The good news is that I've just made it slightly simpler, at least if
you're running from git. I've published an "ncursesw" branch that
contains a hacked ncurses-0.9.1 and a dirty script to install it into
your ../lib/ directory. If you use that AND you run from git next,
you'll see wide characters. It works!
So, just a "few" "simple" commands:
$ git branch --track ncursesw origin/ncursesw
$ git checkout ncursesw
$ cd ncurses-0.9.1/
$ ./run-this-for-sup.sh
$ cd ..
$ git checkout next
$ ruby -Ilib bin/sup
... and you should see wide characters, assuming your terminal is
capable. If make dies, you probably need to install some kind of
ncursesw development library. On my Debian system it's
a package called libncursesw5-dev.
A gold star to anyone who makes a nice wiki page out of this.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-22 23:18 ` William Morgan
@ 2008-04-23 0:09 ` Israel Herraiz
2008-04-23 2:03 ` William Morgan
2008-04-23 1:08 ` Israel Herraiz
2008-04-24 23:10 ` [sup-talk] [PATCH] fixed dlopen of libc for os x Grant Hollingworth
2 siblings, 1 reply; 11+ messages in thread
From: Israel Herraiz @ 2008-04-23 0:09 UTC (permalink / raw)
Excerpts from William Morgan's message of Wed Apr 23 01:18:00 +0200 2008:
> The good news is that I've just made it slightly simpler, at least if
> you're running from git. I've published an "ncursesw" branch that
> contains a hacked ncurses-0.9.1 and a dirty script to install it into
> your ../lib/ directory. If you use that AND you run from git next,
> you'll see wide characters. It works!
Great. I have obtained the new changes from the git repository, and
applied the recipe that you give, and yes, it works! Thanks!
I am noting only another odd thing: when I press enter to from the
inbox-mode to the thread-mode, some characters of the inbox view
"persists" in the screen. I will file the bug if you consider that it
is necessary.
> A gold star to anyone who makes a nice wiki page out of this.
I guess that is my turn :-). I will add a page about UTF-8 with a
summary of this message and the message that you link.
Thanks again for your help.
Cheers,
Israel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-22 23:18 ` William Morgan
2008-04-23 0:09 ` Israel Herraiz
@ 2008-04-23 1:08 ` Israel Herraiz
2008-04-23 1:53 ` William Morgan
2008-04-24 19:36 ` Marc Hartstein
2008-04-24 23:10 ` [sup-talk] [PATCH] fixed dlopen of libc for os x Grant Hollingworth
2 siblings, 2 replies; 11+ messages in thread
From: Israel Herraiz @ 2008-04-23 1:08 UTC (permalink / raw)
Excerpts from William Morgan's message of Wed Apr 23 01:18:00 +0200 2008:
> A gold star to anyone who makes a nice wiki page out of this.
I have added a summary of the content of the two messages (the parent
message of this one, and the one you link with patches for the
gem). It is available at:
http://sup.rubyforge.org/wiki/wiki.pl?UTF8
I have added a link to this page in the main page of the wiki.
Please note that I have tested only the method included in your
previous message (using the git sources). I have not tested the other
method (using the gem). I would appreaciate if someone can go to the
wiki page, follow the recipe and check that is correct (or change
whatever might be wrong).
Cheers,
Israel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-23 1:08 ` Israel Herraiz
@ 2008-04-23 1:53 ` William Morgan
2008-04-24 19:36 ` Marc Hartstein
1 sibling, 0 replies; 11+ messages in thread
From: William Morgan @ 2008-04-23 1:53 UTC (permalink / raw)
Reformatted excerpts from Israel Herraiz's message of 2008-04-22:
> http://sup.rubyforge.org/wiki/wiki.pl?UTF8
Thanks! Very thorough.
> Please note that I have tested only the method included in your
> previous message (using the git sources). I have not tested the other
> method (using the gem). I would appreaciate if someone can go to the
> wiki page, follow the recipe and check that is correct (or change
> whatever might be wrong).
I gave it a little tweak.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-23 0:09 ` Israel Herraiz
@ 2008-04-23 2:03 ` William Morgan
0 siblings, 0 replies; 11+ messages in thread
From: William Morgan @ 2008-04-23 2:03 UTC (permalink / raw)
Reformatted excerpts from Israel Herraiz's message of 2008-04-22:
> I am noting only another odd thing: when I press enter to from the
> inbox-mode to the thread-mode, some characters of the inbox view
> "persists" in the screen.
Yeah, I see this too. Looks like a string length issue, actually.
I'll look into it---probably an easy fix.
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] System encoding versus messages encoding
2008-04-23 1:08 ` Israel Herraiz
2008-04-23 1:53 ` William Morgan
@ 2008-04-24 19:36 ` Marc Hartstein
1 sibling, 0 replies; 11+ messages in thread
From: Marc Hartstein @ 2008-04-24 19:36 UTC (permalink / raw)
Excerpts from Israel Herraiz's message of Tue Apr 22 21:08:00 -0400 2008:
> Excerpts from William Morgan's message of Wed Apr 23 01:18:00 +0200 2008:
> > A gold star to anyone who makes a nice wiki page out of this.
>
> http://sup.rubyforge.org/wiki/wiki.pl?UTF8
>
> Please note that I have tested only the method included in your
> previous message (using the git sources). I have not tested the other
> method (using the gem). I would appreaciate if someone can go to the
> wiki page, follow the recipe and check that is correct (or change
> whatever might be wrong).
Have just followed the Git instructions on the wiki, and some of my wide
character weirdness has gone away.
Thanks, both of you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080424/fd4328b5/attachment.bin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] [PATCH] fixed dlopen of libc for os x
2008-04-22 23:18 ` William Morgan
2008-04-23 0:09 ` Israel Herraiz
2008-04-23 1:08 ` Israel Herraiz
@ 2008-04-24 23:10 ` Grant Hollingworth
2 siblings, 0 replies; 11+ messages in thread
From: Grant Hollingworth @ 2008-04-24 23:10 UTC (permalink / raw)
OS X likes to do its own thing.
---
lib/sup.rb | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/lib/sup.rb b/lib/sup.rb
index c4d1dd5..afd030f 100644
--- a/lib/sup.rb
+++ b/lib/sup.rb
@@ -14,7 +14,7 @@ require 'curses'
require 'dl/import'
module LibC
extend DL::Importable
- dlload "libc.so.6"
+ dlload Config::CONFIG['arch'] =~ /darwin/ ? "libc.dylib" : "libc.so.6"
extern "void setlocale(int, const char *)"
end
LibC.setlocale(6, "") # LC_ALL == 6
--
1.5.4.4
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] [PATCH] fixed dlopen of libc for os x
2008-04-27 6:48 William Morgan
@ 2008-04-27 8:45 ` Christopher Warrington
0 siblings, 0 replies; 11+ messages in thread
From: Christopher Warrington @ 2008-04-27 8:45 UTC (permalink / raw)
Excerpts from William Morgan's message of Sun Apr 27 01:48:36 -0500 2008:
>> OS X likes to do its own thing.
> Merged into next. Thanks!
Cygwin does too. You need to load "cygwin1.dll", not "libc.so.6". For
now, I've hacked this into my repository. However, we really should
come up with a cleaner way of handing these architecture-dependent
idiosyncrasies. (The Factory pattern from OO design comes to mind...)
--
christopher Warrington <chrisw at rice.edu>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [sup-talk] [PATCH] fixed dlopen of libc for os x
@ 2008-04-27 6:48 William Morgan
2008-04-27 8:45 ` Christopher Warrington
0 siblings, 1 reply; 11+ messages in thread
From: William Morgan @ 2008-04-27 6:48 UTC (permalink / raw)
Reformatted excerpts from Grant Hollingworth's message of 2008-04-24:
> OS X likes to do its own thing.
Merged into next. Thanks!
--
William <wmorgan-sup at masanjin.net>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-04-27 8:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-21 20:37 [sup-talk] System encoding versus messages encoding Israel Herraiz
2008-04-22 11:39 ` Shot (Piotr Szotkowski)
2008-04-22 23:18 ` William Morgan
2008-04-23 0:09 ` Israel Herraiz
2008-04-23 2:03 ` William Morgan
2008-04-23 1:08 ` Israel Herraiz
2008-04-23 1:53 ` William Morgan
2008-04-24 19:36 ` Marc Hartstein
2008-04-24 23:10 ` [sup-talk] [PATCH] fixed dlopen of libc for os x Grant Hollingworth
2008-04-27 6:48 William Morgan
2008-04-27 8:45 ` Christopher Warrington
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox