Ubuntu 里的 ibus-pinyin

前面写的《Ubuntu 中文拼音输入法小结》中说 ibus-pinyin 是 Python 写的,存在一些谬误。ibus-pinyin 已经用 C++ 重写,并且做了很多改进。

可能有些用 Ubuntu 的朋友要问为什么现在还会有那些崩溃的现象,其实这是发行版的问题,迟迟没有能将新版推送到用户手中。

感谢 Shawn P Huang,Shellexy 和 BYBird 的提醒,博客上的那篇文章也做了更新。

Linux input method framework brief summary

I am trying to keep this article concise for you to make you have an outline of current condition of Linux (and maybe other platforms like BSDs) input methods. It’s coverage is mostly CJK languages, but I think other languages that use input method would be sure to find there examples in this article. We will start with the most popular ones, and there will be some hints about other ones at last.

Before we start our tour, there are two concepts to know, input method framework and input method engine:

  • An input method framework is designed to serve as a daemon and handle user input events, output the result to target applications or layers.
  • An input method engine is a program to analyze inputed characters and calculate a list of probably results, then send the results to their hosted input method framework to complete the reaction with users and applications.
  • 1. SCIM (Smart Common Input Method)

    Most Linux input method users may have the experience of using SCIM, which is created by Chinese developer Su Zhe for promoting his Intelligent Pinyin input method and providing a better input method framework.

    Some friends of mine are still keep using SCIM even though it is not being maintained, nor SKIM, its sister project on KDE. SCIM was the default choice of distros for years. People developed lots of input method engines for it, for example scim-pinyin, which has been mentioned above as Intelligent Pinyin input method. Users may be also familiar with scim-python, scim-xingma-*, scim-googlepinyin and the still-maintained scim-sunpinyin.

    On a distro maker’s point of view, the glorious age of SCIM has just finished.

    2. IBus (Intelligent Input Bus)

    IBus is the de facto standard of input method framework on nowadays Linux distros, whose author is a Chinese developer, Huang Peng, who has been mentioned as the author of scim-python and scim-xingma-*. IBus is aiming at a “next generation input framework” comparing to SCIM. I think this goal has been achieved – new comers may only know IBus from the very beginning of his adventure on Linux.

    IBus is written in C++, and is designed to be highly modularized: core input bus, gtk/qt interfaces, python binding, table engine, table modules and other input method engines. It uses Gtk immodule, thus is the best choice for GTK+ applications. What’s more, Flash Player support Gtk immodule only and IBus has no problem to work with it. The author of IBus is really helpful with other input engine developers, so there are many input method engines available on IBus framework.

    But IBus has obvious limitations from its design:

  • Firstly, it uses Gtk immodule only, which benefit GTK+ platform applications, but do poorly with QT.
  • Secondly, it depends on gconf, which is unacceptable for some users and distro makers (most of them are anti-gnome holic).
  • Thirdly, the most used input engine, ibus-pinyin is written in Python that caused serious performance limitation. And this engine has had some severe bugs like memory leak and dead loop (100%). Even though the condition is largely improved, users are still complaining about them.
  • Fourthly, the alternative Pinyin engine ibus-sunpinyin is not well maintained, and really lacks of testing. There are some obvious bug leaving their without people interested to fix.
  • Note: ibus-pinyin has been rewritten in mainly C++ with many improvements, and such changes will land on major in very near future (maybe some of them have already published it, I didn’t do a detailed research here).

    3. Fcitx (Free Chinese Input Toy for X)

    Fcitx is an old and new input method. It bears at the same time as SCIM, and now gets a new life with the brand new 4.x series. As name suggests, it is first designed to be a Chinese specific input method by Yuking. During 3.x series, the aim of being a feature rich Chinese input method gave it quite a few of fans, but also kept it from being a default choice of major distros. Fcitx uses XIM, which works well for most platforms (like GTK+ and QT), but has some small problems.

    However, starting from 4.x series, Fcitx has been given a new goal with a new maintainer – a college student at Peking University, Weng Xuetian. Now he has published 4.0.1, the second version of 4.x series, with features like customizable skins, tables which has been wanted for a long time. It has been heavily modularized: all tables are separated, developer-friendly input method engine interface, graphic user configuration tool. Also, 4.x series does not use GBK encoded Chinese configuration files anymore, and UTF-8 encoded English configuration files are used. I would like to highlight its perfect user experience of fcitx-sunpinyin, it is worthwhile for every Pinyin users to give a try.

    There are still issues on its way of (probably) being the default of distros:

  • Though the author has promised Gtk immodule support in 4.1.0 release, the feature is still not available now.
  • The internal Pinyin input method is old, and still not being separate out from the framework core because of too close integration before. The work will be done in 4.1.0 as well.
  • Fcitx is still lacking of people who are interested in developing input method engines, even if the interface is more friendly to developers. There is an example, fcitx-sunpinyin (written in C++) has only ~300 lines to make everything work perfectly with libsunpinyin.
  • Properly speaking, Fcitx is still not a input method framework because of the reasons listed above, but it will be, also as said above.

    Above are the most famous input methods, here is a list of other things in Linux input methods with short descriptions.

    1. ucimf (Unicode Console Input Method Framework)

    ucimf is an input method framework for Linux unicode framebuffer console, which is mainly with fbterm and jfberm. It is developed by Chinese developer, Mat. He maintains a series of input method engines ported from BSD licensed Mac OSX input method OpenVanilla.

    There are other solutions under framebuffer console, for example ibus-fbterm (development has stopped), but I still recommend to use ucimf because it full featured and well maintained.

    2. SunPinyin

    One thing to clarify, Sunpinyin isn’t a frame work, but it is important so I would like to mention it here. We have mentioned scim-sunpinyin, ibus-sunpinyin and the recommended fcitx-sunpinyin. In fact there is also a standalone xsunpinyin alive. SunPinyin is a statistical language model based Chinese input method, which was firstly developed by Sun Beijing Globalization team, and opened source to community with Opensolaris project, with LGPLv2 and CDDL dual-licenses.

    SunPinyin would be heavily used from now on, it is the best Pinyin engine on Linux and some other platforms.

    3. SCIM2
    SCIM2 was trying to be a next generation SCIM, but abandoned because of the emerging star during that period – IBus.

    4. ImBus

    ImBus is created by the author of SCIM, and he would like to make it a general input method framework including all known best techniques with minimal dependencies. But the project halted with the same reason like SCIM2, no code in its svn repository.

    5. Fitx (Fun Input Toy for Linux)

    Fitx was a flash in the pan on Linux, the project stopped soon after its emerging. Fitx is a ported version of FIT input method on Mac OSX, but implemented using SCIM framework.

    6. gcin
    gcin is a input method developed by traditional Chinese community, targeted to traditional Chinese users.

    7. uim

    uim is a input method framework made by Japanese developers. It is a little different because uim isn’t an input method server (XIM is a server), it’s just a library. Because the designer believe many people don’t need a full featured platform but only something enough to work.

    For information about input method types, there is a good website: http://seba.studentenweb.org/thesis/im.php

    Update:
    2011-01-15 21:30
    Thanks to Zhengpeng Hou, I’ve added some description about uim, and a notice about whether fcitx should be considered as a framework now.
    2011-01-28 00:15
    Thanks to Shawn P Huang, ibus-pinyin has been rewritten in mainly C++ with many improvements.

    Ubuntu 中文拼音输入法小结

    原文发表于 Ubuntu 中文邮件列表,欢迎大家到邮件列表参与讨论。

    这两天讨论了一些拼音输入法的话题,我做个小小的总结,欢迎讨论。

    眼下 Ubuntu 默认的输入平台是 ibus,随光盘发布 ibus-pinyin,默认的五笔输入法是 ibus-table-wubi,繁体中文默认输入法是 ibus-chewing。

    我只会用拼音,五笔和酷音的情况不了解。下面我简单总结下我所知的几个常见输入法的情况,也看看各位认为将来 Ubuntu 默认使用哪个更好。

    1. IBus

    ibus 平台目前是各大发行版的标配,框架本身用 C++ 写成,模块化程度非常高,有很多可选的输入法。作者在继续开发,对输入法开发者也比较热情。

    ibus-pinyin 最初是 python 写成的,这些版本的效率略逊一筹,其中还出现过一些 CPU 100% 和内存泄露的 bug。ibus-pinyin 后来已经主要用 C++ 重写并做了非常多的改进,但是因为发行版的原因迟迟没有进入到仓库,接近纯 C++ 版本的 ibus-pinyin 会出现在 11.04 中。

    ibus 的拼音还可以选择 ibus-sunpinyin,但是使用的人比较少,反馈也比较少。

    ibus 是使用 gtk immodule的,这使得它在 GTK 程序里表现非常出色,可以在 flash 里输入中文,但是在 QT 程序上表现一般。可以说它主要是 GTK 的输入平台。

    2. Fcitx

    fcitx 是老牌的 Linux 中文输入法,有一些粉丝,不过很多人对它的印象还是基于 GBK 中文配置文件的 3.x。新的 fcitx 4 已经使用英文 utf8 配置文件,支持自定义皮肤和码表,改进了输入法接口,新增了图形化配置工具。Bug 也不比 ibus 多。

    fcitx 的最佳拼音方案是 fcitx-sunpinyin。上词准确度和所有 sunpinyin 核心的输入法相同。相比于 ibus-sunpinyin 和 scim-sunpinyin,它的优势在于可以利用 fcitx 本身的各种功能(比如皮肤),流畅性和fcitx 内置拼音输入法没什么差别。

    fcitx 的问题有以下几个:1. 内置的 pinyin 输入法还没能独立成为模块,且它的算法也已经落后;2. 虽然给 fcitx 写输入法已经比 ibus 更简单,却仍然缺乏关注,可用的输入法比 ibus 少一些。

    fcitx 使用 XIM,更接近是一个 X 的输入平台。但是 flash 不支持 XIM,某些光标跟随也有点小毛病。4.1 版本会有 gtk immodule 支持,上述问题会一并解决。

    3. Scim

    scim 平台是曾经各发行版的标配,scim-pinyin 的输入流畅性也始终好于 ibus-pinyin,相信有一些人仍然坚守在 scim 的阵地上。然而 scim 和它在 kde 上的 skim 都没有人在进行维护,Debian/Ubuntu 也只是打包人员偶尔修复几个简单的 bug。我们不可能逆行回去再使用它。

    scim 的拼音输入法有 scim-pinyin,scim-python,scim-googlepinyin 和 scim-sunpinyin,前三者目前都没有人维护。推广 scim-pinyin 的智能拼音输入法是作者开发 SCIM 的原因;scim-python 是 ibus-pinyin 的前身;scim-googlepinyin 是用 android 上输入法的算法写成的。

    4. Yong

    yong 是最近曝光率有点高的小小输入法,作者说是为了推广他的永码而开发。我没有使用过它,仅从配置文件看猜它是同 fcitx 一样使用了XIM,因此也会受 XIM 的各种问题影响。yong 是闭源软件,不论是许可证原因,还是平台移植性原因,都不可能被主流发行版作为默认输入法。当然,给用户多一个选择总是好事。

    Update:
    2011-01-28 00:15
    感谢 Shawn P Huang,Shellexy 和 BYBird,更新了 ibus-pinyin 的部分,说明新版已经用 C++ 重写,并已经改进了很多地方。

    VM disk performance on btrfs partition

    Recently I read Alexandre Rosenfeld’s blog post about low performance of btrfs while placing VM’s disk on it, so I had a try and got almost the same result.

    I tried to have someone look into the problem, and now get the answer, quoting Bastian Blank :

    This is a result of the filesystem design, no bug. For decent performance don’t use O_SYNC on files. For qemu use cache=writeback in the disk definition.

    This work by Aron Xu is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported.