Ruby, part2

12:06 pm General

Thanks to Ubuntu, Ruby 1.9 is available in Gutsy. And I still cannot find the way (the letter:) to unpack UTF-16. Should I wait for Ruby 4 for UTF-16 support (necessary for proper handling of id3 tags)?

And I am really happy to see ruby packaged for Maemon (now – with GNOME and Hildon, hurray!)

PS And lads thanks for mentioning KCODE – at least handling of UTF-8 is bearable.

5 Responses

  1. Felipe Contreras Says:

    Fortunately the Ruby community is very friendly and would probably welcome your rant. I recommend the Ruby Talk mailing list.

    http://www.ruby-lang.org/en/community/

    What I could find is that Unicode strings are planned for Ruby 2.0 and:

    http://rubyforge.org/projects/icu4r/
    http://raa.ruby-lang.org/project/uconv/
    http://www.geocities.jp/kosako3/oniguruma/

  2. Rutger Nijlunsing Says:

    I don’t know what you mean by ‘unpacking’, but for UTF16 I tend to use

    require ‘iconv’

    class Iconv
    def self.utf8_to_utf16(str); Iconv.iconv(“UTF-16LE”, “UTF-8”, str)[0]; end
    def self.utf16_to_utf8(str); Iconv.iconv(“UTF-8”, “UTF-16LE”, str)[0]; end
    end

    …to convert between utf8 and utf16.

  3. Sergey Udaltsov Says:

    Rutger, I mean String.unpack function (modeled after the perl’s one AFAIK). It has syntax for UTF-8 but not for UTF-16. I know about iconv (and uconv) but that’s not exactly same thing…

  4. jose akallo Says:

    There is a u16tou8 method in rbuconv, http://www.yoshidam.net/Ruby.html that goes like this:


    def u16tou8(str)
    ret = combineSurrogatePair(str.unpack('v*')).pack('U*')
    ret.taint if str.tainted?
    ret
    end

    Would this do what you want?

    Some useful utf8 methods btw can be found at http://snippets.dzone.com/posts/show/4527

    Cheers,
    jose

  5. Sergey Udaltsov Says:

    jose,

    Thanks for the snippet and the links! Very useful!

Leave a Comment

Your comment

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.