* [sup-talk] [PATCH] detect and set charset on text/* attachments
@ 2012-06-21 20:26 Helge Titlestad
2012-06-22 8:48 ` Gaute Hope
0 siblings, 1 reply; 6+ messages in thread
From: Helge Titlestad @ 2012-06-21 20:26 UTC (permalink / raw)
To: sup-talk
I got some feedback from non-suppers that my utf-8 text attachments were
messed up. When I checked they (the MIME headers) lacked any info on charset,
which I believe should be set for text/*.
Here's a patch that uses the chardet gem to (try to) detect the appropriate charset
and sets it in the Content-Type header.
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-talk] [PATCH] detect and set charset on text/* attachments
2012-06-21 20:26 [sup-talk] [PATCH] detect and set charset on text/* attachments Helge Titlestad
@ 2012-06-22 8:48 ` Gaute Hope
2012-06-22 10:29 ` Helge Titlestad
0 siblings, 1 reply; 6+ messages in thread
From: Gaute Hope @ 2012-06-22 8:48 UTC (permalink / raw)
To: Sup Talk
Hi,
Would be interested; but where's the patch ? ;)
- Gaute
Excerpts from Helge Titlestad's message of 2012-06-21 22:26:19 +0200:
> I got some feedback from non-suppers that my utf-8 text attachments were
> messed up. When I checked they (the MIME headers) lacked any info on charset,
> which I believe should be set for text/*.
>
> Here's a patch that uses the chardet gem to (try to) detect the appropriate charset
> and sets it in the Content-Type header.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-talk] [PATCH] detect and set charset on text/* attachments
2012-06-22 8:48 ` Gaute Hope
@ 2012-06-22 10:29 ` Helge Titlestad
0 siblings, 0 replies; 6+ messages in thread
From: Helge Titlestad @ 2012-06-22 10:29 UTC (permalink / raw)
To: sup-talk
Excerpts from Gaute Hope's message of Fri Jun 22 10:48:41 +0200 2012:
> Would be interested; but where's the patch ? ;)
Hah, sorry guys. Someone tried to send an email from my sup last night[1],
and managed to re-send this oooold mail instead. I think the utf-8 stuff
was fixed a long time ago. (=
--
alge
[1]: Yeah, alcohol was involved.
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-talk] [PATCH] detect and set charset on text/* attachments
2009-11-03 17:16 ` William Morgan
@ 2009-11-03 19:23 ` Helge Titlestad
0 siblings, 0 replies; 6+ messages in thread
From: Helge Titlestad @ 2009-11-03 19:23 UTC (permalink / raw)
To: sup-talk
[-- Attachment #1: Type: text/plain, Size: 706 bytes --]
Excerpts from William Morgan's message of Tue Nov 03 18:16:32 +0100 2009:
> It looks like the chardet gem is unmaintained. But someone decided to
> make their own special version called rchardet which is completely
> equivalent version but *is* maintained.
>
> I suggest we use rchardet instead of chardet. Would you like to change
> the patch? If not, I will get to it at some point.
Here you go!
One thing I noticed when trying it out: It will set charset 'ascii' for ascii
text, which is allowed by the RFC but "US-ASCII" is preferred. I think I prefer
to not create a special case in the code to change it to US-ASCII or remove the
charset for ascii text, but other people might disagree.
--
alge
[-- Attachment #2: 0001-Detect-charset-for-text-file-attachments.patch --]
[-- Type: application/octet-stream, Size: 1919 bytes --]
From 094962e04eafc50ba707c0c35885f161c0fc9641 Mon Sep 17 00:00:00 2001
From: Helge Titlestad <helgedt@tihlde.org>
Date: Tue, 3 Nov 2009 20:11:25 +0100
Subject: [PATCH] Detect charset for text/* file attachments.
Adds dependency on rchardet gem, and uses it to detect the charset.
---
README.txt | 1 +
Rakefile | 1 +
lib/sup/util.rb | 9 ++++++++-
3 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/README.txt b/README.txt
index 4204270..184dd09 100644
--- a/README.txt
+++ b/README.txt
@@ -106,6 +106,7 @@ Current limitations which will be fixed:
- mime-types
- gettext
- fastthread
+ - rchardet
== INSTALL:
diff --git a/Rakefile b/Rakefile
index 67cd0d2..c8d9243 100644
--- a/Rakefile
+++ b/Rakefile
@@ -57,6 +57,7 @@ spec = Gem::Specification.new do |s|
s.add_dependency "mime-types", "~> 1"
s.add_dependency "gettext"
s.add_dependency "fastthread"
+ s.add_dependency "rchardet", ">= 1.3"
end
Rake::GemPackageTask.new(spec) do |pkg|
diff --git a/lib/sup/util.rb b/lib/sup/util.rb
index f99e1c1..7b747fb 100644
--- a/lib/sup/util.rb
+++ b/lib/sup/util.rb
@@ -3,6 +3,7 @@ require 'lockfile'
require 'mime/types'
require 'pathname'
require 'set'
+require 'rchardet'
## time for some monkeypatching!
class Lockfile
@@ -71,8 +72,14 @@ module RMail
def self.make_attachment payload, mime_type, encoding, filename
a = Message.new
+
+ cs = CharDet.detect(payload)['encoding'] if mime_type =~ /^text\//i
+debug(cs)
+ ct = "#{mime_type}; name=#{filename.inspect}"
+ ct += "; charset=#{cs}" if cs
+
a.header.add "Content-Disposition", "attachment; filename=#{filename.inspect}"
- a.header.add "Content-Type", "#{mime_type}; name=#{filename.inspect}"
+ a.header.add "Content-Type", ct
a.header.add "Content-Transfer-Encoding", encoding if encoding
a.body =
case encoding
--
1.5.6.5
[-- Attachment #3: Type: text/plain, Size: 140 bytes --]
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [sup-talk] [PATCH] detect and set charset on text/* attachments
2009-10-19 14:07 Helge Titlestad
@ 2009-11-03 17:16 ` William Morgan
2009-11-03 19:23 ` Helge Titlestad
0 siblings, 1 reply; 6+ messages in thread
From: William Morgan @ 2009-11-03 17:16 UTC (permalink / raw)
To: sup-talk
Reformatted excerpts from Helge Titlestad's message of 2009-10-19:
> I got some feedback from non-suppers that my utf-8 text attachments
> were messed up. When I checked they (the MIME headers) lacked any info
> on charset, which I believe should be set for text/*.
After reviewing the RFCs, yeah, I think you're basically right. The
charset parameter is not required to be set for text/* mime types, but
if it's unset, the part is assumed to be us-ascii.
> Here's a patch that uses the chardet gem to (try to) detect the
> appropriate charset and sets it in the Content-Type header.
Although I don't relish adding yet another gem dependency, I think this
is the right approach.
It looks like the chardet gem is unmaintained. But someone decided to
make their own special version called rchardet which is completely
equivalent version but *is* maintained. (What is it with these goddamn
ruby people.)
I suggest we use rchardet instead of chardet. Would you like to change
the patch? If not, I will get to it at some point.
> Please tell me if I should use some different way of sending
> patches... This git flow is a bit new to me. (=
Nope, this is perfect. Thanks!
--
William <wmorgan-sup@masanjin.net>
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 6+ messages in thread
* [sup-talk] [PATCH] detect and set charset on text/* attachments
@ 2009-10-19 14:07 Helge Titlestad
2009-11-03 17:16 ` William Morgan
0 siblings, 1 reply; 6+ messages in thread
From: Helge Titlestad @ 2009-10-19 14:07 UTC (permalink / raw)
To: sup-talk
[-- Attachment #1: Type: text/plain, Size: 536 bytes --]
I got some feedback from non-suppers that my utf-8 text attachments were
messed up. When I checked they (the MIME headers) lacked any info on charset,
which I believe should be set for text/*.
Here's a patch that uses the chardet gem to (try to) detect the appropriate charset
and sets it in the Content-Type header.
Can't guarantee its robustness - have only tried on a couple of text files and
one non-text file.
Please tell me if I should use some different way of sending patches... This git
flow is a bit new to me. (=
--
alge
[-- Attachment #2: 0001-Detect-charset-for-text-file-attachments.patch --]
[-- Type: application/octet-stream, Size: 1927 bytes --]
From 735a5ceb757599af71702d4ece8d29cb11f2c65b Mon Sep 17 00:00:00 2001
From: Helge Titlestad <helgedt@tihlde.org>
Date: Mon, 19 Oct 2009 16:03:56 +0200
Subject: [PATCH] Detect charset for text/* file attachments.
Adds dependency on chardet gem, and uses it to detect the charset.
---
README.txt | 1 +
Rakefile | 1 +
lib/sup/util.rb | 8 +++++++-
3 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/README.txt b/README.txt
index 4204270..3a98fa3 100644
--- a/README.txt
+++ b/README.txt
@@ -106,6 +106,7 @@ Current limitations which will be fixed:
- mime-types
- gettext
- fastthread
+ - chardet
== INSTALL:
diff --git a/Rakefile b/Rakefile
index 67cd0d2..3fb0d5e 100644
--- a/Rakefile
+++ b/Rakefile
@@ -57,6 +57,7 @@ spec = Gem::Specification.new do |s|
s.add_dependency "mime-types", "~> 1"
s.add_dependency "gettext"
s.add_dependency "fastthread"
+ s.add_dependency "chardet", ">= 0.9.0"
end
Rake::GemPackageTask.new(spec) do |pkg|
diff --git a/lib/sup/util.rb b/lib/sup/util.rb
index f99e1c1..ef7b892 100644
--- a/lib/sup/util.rb
+++ b/lib/sup/util.rb
@@ -3,6 +3,7 @@ require 'lockfile'
require 'mime/types'
require 'pathname'
require 'set'
+require 'UniversalDetector'
## time for some monkeypatching!
class Lockfile
@@ -71,8 +72,13 @@ module RMail
def self.make_attachment payload, mime_type, encoding, filename
a = Message.new
+
+ cs = UniversalDetector::chardet(payload)['encoding'] if mime_type =~ /^text\//i
+ ct = "#{mime_type}; name=#{filename.inspect}"
+ ct += "; charset=#{cs}" if cs
+
a.header.add "Content-Disposition", "attachment; filename=#{filename.inspect}"
- a.header.add "Content-Type", "#{mime_type}; name=#{filename.inspect}"
+ a.header.add "Content-Type", ct
a.header.add "Content-Transfer-Encoding", encoding if encoding
a.body =
case encoding
--
1.5.6.5
[-- Attachment #3: Type: text/plain, Size: 140 bytes --]
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-06-22 10:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-21 20:26 [sup-talk] [PATCH] detect and set charset on text/* attachments Helge Titlestad
2012-06-22 8:48 ` Gaute Hope
2012-06-22 10:29 ` Helge Titlestad
-- strict thread matches above, loose matches on Subject: below --
2009-10-19 14:07 Helge Titlestad
2009-11-03 17:16 ` William Morgan
2009-11-03 19:23 ` Helge Titlestad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox