Ruby: disappointment of the year :(

9:09 pm General

Heard a lot about Ruby. Read some articles etc. Got deeply impressed by that really nice language. Yesterday, tried it on my small personal project – and now I am crying aloud. No native support for UTF-8 strings (well, I mean stable 1.8, not some development branch). I  still wonder how it could happen that the language with that kind of problem can even be considered as mainstream in 2007?

19 Responses

  1. Robin Says:

    You can always try Python, it is on the same level as Ruby, but has pretty good Unicode support and is better documented :).

  2. Uzytkownik Says:

    C is at least part of mainstream and has no support for unicode.
    C++ has support but nobody use it.
    Such languages (in mainstream without unicode support) are possible.

  3. Sergey Udaltsov Says:

    Robin, I am just not really excited about Python the way I am/was excited about Ruby… Ruby’s elegant syntax really bought be.

    Uzytkownik, C and C++ are things from the past. “Nobody use C++” is a strong exxageration IMNSHO – and that LANGUAGE (not standard library) does not support unicode either. But I am sure that new languages without native Unicode support should not have a single chance.

  4. Oded Says:

    As the commenter before me noted, utf8 is not a requirement for a modern mainstream language. I personally use ruby in an application that processes text in utf8 and it works well as long as you don’t try to get to fancy – without native unicode support.

    Ruby is a very nice multi-paradigm language (in that it differs from most languages, including python) very similar to perl, only much cleaner. I can’t say that it the end-all-be-all of computer languages, but its useful and i enjoy thinking in it (most of my commercial work is done in java and ruby keeps me from attrofying too much :-)

  5. Gabriel Burt Says:

    Ruby on Rails has extended Ruby itself to add support for UTF-8 since version 1.2. I agree it is strange Ruby doesn’t have support itself.

  6. Sergey Udaltsov Says:

    Oded, as I said, I really like Ruby as a language (especially the fact that is is multi-paradigm) – but the simple fact that standard string functions do not work properly with utf8 is killing me…

  7. Janne Says:

    The (relative) lack of UTF-8 support – it’s not nonexistence – has a fair bit to do with its origin in Japan. The Unicode rather messed up over Japanese, Chinese and Korean for various (long, boring) reasons, meaning UTF isn’t terribly useful for those langauges.

    What you do have is fairly good support for converting between various encodings, and in practice I haven’t found this to be much of a limitation. There’s apparently support for completely encoding-agnostic processing for the next version, but that is not yet in a released version as you say.

  8. Stoffe Says:

    Heh, pythonistas have never tried ruby seriously (which is of course why they stick to python) so they don’t know why it would be such a difference. It’s endearing. :)

    The utf8 thingy has bothered lot of us for years, and yes it is a problem – however, for 99.9% of people and 99.9% of the programs out there, it turns out that it actually isn’t a problem, or that it can be easily worked with a $KCODE. You must be in the unlucky fraction.

    That said, it’s more than about f**king time they fixed it and fixed it good (unlike many other “fixes”, including pythons).

  9. Luigi Says:

    >> ‘àèìòù’.chars.slice(0,3).to_s
    => “àèì”

    Here is an example of default utf8 support in Ruby on Rails (trunk)

  10. Eduardo Gonzalez Says:

    I was pretty disappointed by the lack of a Unicode class as well. The current work around now though, is to use one of the conv packages. I use Kconv at work because in Japan you only really need shift-jis and UTF-8. But you should check out the iconv package. Or better yet help make kick-ass Unicode support for Ruby 1.9 and 2.0!

  11. diego Says:

    Ruby 2.0 aka YARV (the new VM) will have complete Unicode support, and Ruby 2.0 will be released this year.

  12. mainwhat? Says:

    You seem to have never heard of PHP, one very mainstreamish language whose design is beyond horrible

  13. Aria Says:

    Could be that patching in a UTF-8 regex engine does enough that nobody cares. It’s 8-bit clean strings, so most operations work Just Fine on UTF-8.

  14. Sergey Udaltsov Says:

    Well, lads, even if UTF-8 is supported somehow (especially with Rails – but I am not interested ATM) but UTF-16 is a much worse situation – since it breaks zero-end promise.

    PS PHP is really horrible, no arguing

    PPS And I am not affiliated with Python in any way;)

  15. Zeno Says:

    Perl does not support UTF-8 in regular expressions, which is also quite annoying …

  16. yourself Says:

    That issue with utf-8 is fixed for ruby 1.9 =)

  17. Chris Hubick Says:

    Python might have passable Unicode support, but it’s scalability is crap (IMO):
    http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock

    I’m sticking with Java (which would suck less with 32 bit UCS-4 char’s).

  18. Nils Says:

    Yes I was also very much disappointed by this fact. Since Ruby was a relative new langague and written by a Japanese programmer, I thought it would have UTF8 support for sure.. but this was not the case, let’s hope for Ruby 2.0.

  19. Kyle Says:

    Stoffe, I always find it amusing when Ruby users just cannot imagine that anyone could used it for a real world product and *not* fall in love with it. I know plenty of developers, and Pythonistas who can’t stand Ruby, and I’m one of them. I am a Python developer who recently worked on a Rails project with a small team, and it was a miserable experience for everyone.

    I just don’t think that Ruby is elegant. It is an extremely large language compared to languages like Python and Lisp, with lots of inconsistent and confusing syntax. I especially don’t understand the love for anonymous blocks instead of having real high order functions.

    Anonymous blocks are by far the worst. If you ask me they encourage users to write disgusting code that makes now sense, and is generally much slower than it could be. Just take a look at the rails source code and you’ll see what I mean. I cannot think of a single problem that anonymous blocks solve that isn’t also solved by a function or a generator. The real trouble is that there are quite a few problems that are solvable with functions that are *not* with anonymous blocks. Just try the broken map method in Ruby and you’ll see what I’m talking about.

    And on top of that and many other problems, the compiler itself is quite unsophisticated. Take this for example:


    def foo()
    x = 3

    def bar()
    puts "in bar()"
    puts "x is #{x}"
    end
    puts "in foo()"

    bar()
    end

    What blows my mind is that bar *DOES NOT* close on foo. WTF? In fact this code will not even compile, because x is not in bars scope. Look at this perfectly legal code to understand what is really going on:

    def foo()
    def bar()
    puts "in bar"
    end

    puts "in foo"
    end

    foo()
    bar()

    The output of course is:

    in foo
    in bar

    Python does not have perfect closure, but at least it has read-only lexical closure for functions (anonymous or not). Ruby has zip, and that makes patterns like currying, or function decorators impossible, which is a deal breaker for me.

    And don’t even get my started on namespacing. Python’s modules make things like introspection so easy and clean. Just think about how much simpler activerecord would be if it *wasn’t* written in Ruby.

    Anyway, rant off. I could go on, but the point is not everyone who is a Python user will go running to Ruby once they see it in all its magical glory. I don’t see that ever happening for me unless a lot of things change.