Ruby, part2
September 22, 2007 12:06 pm GeneralThanks to Ubuntu, Ruby 1.9 is available in Gutsy. And I still cannot find the way (the letter:) to unpack UTF-16. Should I wait for Ruby 4 for UTF-16 support (necessary for proper handling of id3 tags)?
And I am really happy to see ruby packaged for Maemon (now – with GNOME and Hildon, hurray!)
PS And lads thanks for mentioning KCODE – at least handling of UTF-8 is bearable.
September 22nd, 2007 at 1:41 pm
Fortunately the Ruby community is very friendly and would probably welcome your rant. I recommend the Ruby Talk mailing list.
http://www.ruby-lang.org/en/community/
What I could find is that Unicode strings are planned for Ruby 2.0 and:
http://rubyforge.org/projects/icu4r/
http://raa.ruby-lang.org/project/uconv/
http://www.geocities.jp/kosako3/oniguruma/
September 22nd, 2007 at 4:12 pm
I don’t know what you mean by ‘unpacking’, but for UTF16 I tend to use
require ‘iconv’
class Iconv
def self.utf8_to_utf16(str); Iconv.iconv(“UTF-16LE”, “UTF-8”, str)[0]; end
def self.utf16_to_utf8(str); Iconv.iconv(“UTF-8”, “UTF-16LE”, str)[0]; end
end
…to convert between utf8 and utf16.
September 22nd, 2007 at 10:22 pm
Rutger, I mean String.unpack function (modeled after the perl’s one AFAIK). It has syntax for UTF-8 but not for UTF-16. I know about iconv (and uconv) but that’s not exactly same thing…
September 29th, 2007 at 1:14 pm
There is a u16tou8 method in rbuconv, http://www.yoshidam.net/Ruby.html that goes like this:
def u16tou8(str)
ret = combineSurrogatePair(str.unpack('v*')).pack('U*')
ret.taint if str.tainted?
ret
end
Would this do what you want?
Some useful utf8 methods btw can be found at http://snippets.dzone.com/posts/show/4527
Cheers,
jose
September 29th, 2007 at 1:50 pm
jose,
Thanks for the snippet and the links! Very useful!