Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
* [sup-talk] display_length issue with special-characters on non-UTF8 terminal
@ 2009-06-09 10:00 Tarko Tikan
  2009-06-12 19:18 ` William Morgan
  0 siblings, 1 reply; 6+ messages in thread
From: Tarko Tikan @ 2009-06-09 10:00 UTC (permalink / raw)


hey,

When String.display_length was introduced in recent update, it broke the length for non-UTF8 strings that contain the special characters. Wrong length results corrupted display (line ends chopped off).

Terminal is iso-8859-15 and it's detected by sup correctly.

I've tracked it down to /./u regexp. Here are some examples:

irb(main):001:0> "asd".scan(/./u)
=> ["a", "s", "d"]
irb(main):002:0> "asd????".scan(/./u)
=> ["a", "s", "d", "\365\374\344\366"]
irb(main):017:0> "asd???".scan(/./u)
=> ["a", "s", "d"]

irb(main):008:0* "asd".scan(/./)
=> ["a", "s", "d"]
irb(main):009:0> "asd????".scan(/./)
=> ["a", "s", "d", "\365", "\374", "\344", "\366"]

Expecting UTF8 gives unexpected results :) Also, old behaviour of String.length gives correct results with these test cases.

-- 
tarko


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [sup-talk] display_length issue with special-characters on non-UTF8 terminal
  2009-06-09 10:00 [sup-talk] display_length issue with special-characters on non-UTF8 terminal Tarko Tikan
@ 2009-06-12 19:18 ` William Morgan
  2009-06-13 11:13   ` Tarko Tikan
  0 siblings, 1 reply; 6+ messages in thread
From: William Morgan @ 2009-06-12 19:18 UTC (permalink / raw)


Reformatted excerpts from Tarko Tikan's message of 2009-06-09:
> When String.display_length was introduced in recent update, it broke
> the length for non-UTF8 strings that contain the special characters.
> Wrong length results corrupted display (line ends chopped off).

That's a good point. I got a little utf8-centric with those changes.
(I'm assuming that your terminal encoding is not UTF-8.)

Does this patch fix the issue? If so, I will release an 0.8.1.

--- cut here ---
diff --git a/lib/sup.rb b/lib/sup.rb
index 4f59eaa..20835ae 100644
--- a/lib/sup.rb
+++ b/lib/sup.rb
@@ -244,7 +244,7 @@ end
     Redwood::log "using character set encoding #{$encoding.inspect}"
   else
     Redwood::log "warning: can't find character set by using locale, defaulting
-    $encoding = "utf-8"
+    $encoding = "UTF-8"
   end
 
 ## now everything else (which can feel free to call Redwood::log at load time)
diff --git a/lib/sup/util.rb b/lib/sup/util.rb
index 8a3004f..d5310bc 100644
--- a/lib/sup/util.rb
+++ b/lib/sup/util.rb
@@ -172,7 +172,13 @@ class Object
 end
 
 class String
-  def display_length; scan(/./u).size end
+  def display_length
+    if $encoding == "UTF-8"
+      scan(/./u).size
+    else
+      size
+    end
+  end
 
   def camel_to_hyphy
     self.gsub(/([a-z])([A-Z0-9])/, '\1-\2').downcase

-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [sup-talk] display_length issue with special-characters on non-UTF8 terminal
  2009-06-12 19:18 ` William Morgan
@ 2009-06-13 11:13   ` Tarko Tikan
  2009-06-15 14:10     ` William Morgan
  2009-06-17 16:04     ` William Morgan
  0 siblings, 2 replies; 6+ messages in thread
From: Tarko Tikan @ 2009-06-13 11:13 UTC (permalink / raw)


> (I'm assuming that your terminal encoding is not UTF-8.)

No, it's not.

> Does this patch fix the issue? If so, I will release an 0.8.1.

Yes it does. To me, this approach felt "hackish" so I didn't come up with a patch :) But I still don't have better idea how to fix it so it'll have to stay like this.

> +    if $encoding == "UTF-8"
> +      scan(/./u).size
> +    else
> +      size
> +    end

It would probably be correct to use:

if $encoding == "UTF-8"
	scan(/./u).size
else
	length
end

Thats because scan returns a array (hence using the size), without scan you are just invoking on string and it's correct to use length (for some reason size works too, backward compatibility?) 

-- 
tarko


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [sup-talk] display_length issue with special-characters on non-UTF8 terminal
  2009-06-13 11:13   ` Tarko Tikan
@ 2009-06-15 14:10     ` William Morgan
  2009-06-17 16:04     ` William Morgan
  1 sibling, 0 replies; 6+ messages in thread
From: William Morgan @ 2009-06-15 14:10 UTC (permalink / raw)


Reformatted excerpts from Tarko Tikan's message of 2009-06-13:
> Yes it does. To me, this approach felt "hackish" so I didn't come up
> with a patch :) But I still don't have better idea how to fix it so
> it'll have to stay like this.

It's hackish because Ruby 1.8 has shitty multibyte support. The only
reason it works at all is because byte length is character length (at
least most of the time) in your encoding.

There is a multibyte gem out there that I'm keeping an eye on. Also Ruby
1.9.1 allegedgly fixes this problem.

> Thats because scan returns a array (hence using the size), without
> scan you are just invoking on string and it's correct to use length
> (for some reason size works too, backward compatibility?) 

Size and length are synonmys for both arrays and strings. I used size
there for symmetry.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [sup-talk] display_length issue with special-characters on non-UTF8 terminal
  2009-06-13 11:13   ` Tarko Tikan
  2009-06-15 14:10     ` William Morgan
@ 2009-06-17 16:04     ` William Morgan
  2009-06-17 18:34       ` Nicolas Pouillard
  1 sibling, 1 reply; 6+ messages in thread
From: William Morgan @ 2009-06-17 16:04 UTC (permalink / raw)


Reformatted excerpts from Tarko Tikan's message of 2009-06-13:
> william wrote:
> > Does this patch fix the issue? If so, I will release an 0.8.1.
> 
> Yes it does.  patch :) But I still don't have better idea how to fix
> it so it'll have to stay like this.

I have released an 0.8.1 which has this patch in it.
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [sup-talk] display_length issue with special-characters on non-UTF8 terminal
  2009-06-17 16:04     ` William Morgan
@ 2009-06-17 18:34       ` Nicolas Pouillard
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Pouillard @ 2009-06-17 18:34 UTC (permalink / raw)


Excerpts from William Morgan's message of Wed Jun 17 18:04:34 +0200 2009:
> Reformatted excerpts from Tarko Tikan's message of 2009-06-13:
> > william wrote:
> > > Does this patch fix the issue? If so, I will release an 0.8.1.
> > 
> > Yes it does.  patch :) But I still don't have better idea how to fix
> > it so it'll have to stay like this.
> 
> I have released an 0.8.1 which has this patch in it.

I still have issues with display_length. I use UTF-8, urxvt
and some characters disappear when a line contains special characters.

For instance in thread-view-mode if a line contains a special character
then the last character is dropped.

I've "fixed" the issue by reverting a display_length call to a size call
as in the attached patch.

diff --git a/lib/sup/buffer.rb b/lib/sup/buffer.rb
index 8eedf96..795b4c9 100644
--- a/lib/sup/buffer.rb
+++ b/lib/sup/buffer.rb
@@ -114,7 +114,7 @@ class Buffer
     stringl += 1 while stringl < s.length && s[0 ... stringl].display_length < maxl
     @w.mvaddstr y, x, s[0 ... stringl]
     unless opts[:no_fill]
-      l = s.display_length
+      l = s.size
       unless l >= maxl
         @w.mvaddstr(y, x + l, " " * (maxl - l))
       end

-- 
Nicolas Pouillard
http://nicolaspouillard.fr


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-17 18:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-09 10:00 [sup-talk] display_length issue with special-characters on non-UTF8 terminal Tarko Tikan
2009-06-12 19:18 ` William Morgan
2009-06-13 11:13   ` Tarko Tikan
2009-06-15 14:10     ` William Morgan
2009-06-17 16:04     ` William Morgan
2009-06-17 18:34       ` Nicolas Pouillard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox