I am trying to keep this article concise for you to make you have an outline of current condition of Linux (and maybe other platforms like BSDs) input methods. It’s coverage is mostly CJK languages, but I think other languages that use input method would be sure to find there examples in this article. We will start with the most popular ones, and there will be some hints about other ones at last.
Before we start our tour, there are two concepts to know, input method framework and input method engine:
1. SCIM (Smart Common Input Method)
Most Linux input method users may have the experience of using SCIM, which is created by Chinese developer Su Zhe for promoting his Intelligent Pinyin input method and providing a better input method framework.
Some friends of mine are still keep using SCIM even though it is not being maintained, nor SKIM, its sister project on KDE. SCIM was the default choice of distros for years. People developed lots of input method engines for it, for example scim-pinyin, which has been mentioned above as Intelligent Pinyin input method. Users may be also familiar with scim-python, scim-xingma-*, scim-googlepinyin and the still-maintained scim-sunpinyin.
On a distro maker’s point of view, the glorious age of SCIM has just finished.
2. IBus (Intelligent Input Bus)
IBus is the de facto standard of input method framework on nowadays Linux distros, whose author is a Chinese developer, Huang Peng, who has been mentioned as the author of scim-python and scim-xingma-*. IBus is aiming at a “next generation input framework” comparing to SCIM. I think this goal has been achieved – new comers may only know IBus from the very beginning of his adventure on Linux.
IBus is written in C++, and is designed to be highly modularized: core input bus, gtk/qt interfaces, python binding, table engine, table modules and other input method engines. It uses Gtk immodule, thus is the best choice for GTK+ applications. What’s more, Flash Player support Gtk immodule only and IBus has no problem to work with it. The author of IBus is really helpful with other input engine developers, so there are many input method engines available on IBus framework.
But IBus has obvious limitations from its design:
Note: ibus-pinyin has been rewritten in mainly C++ with many improvements, and such changes will land on major in very near future (maybe some of them have already published it, I didn’t do a detailed research here).
3. Fcitx (Free Chinese Input Toy for X)
Fcitx is an old and new input method. It bears at the same time as SCIM, and now gets a new life with the brand new 4.x series. As name suggests, it is first designed to be a Chinese specific input method by Yuking. During 3.x series, the aim of being a feature rich Chinese input method gave it quite a few of fans, but also kept it from being a default choice of major distros. Fcitx uses XIM, which works well for most platforms (like GTK+ and QT), but has some small problems.
However, starting from 4.x series, Fcitx has been given a new goal with a new maintainer – a college student at Peking University, Weng Xuetian. Now he has published 4.0.1, the second version of 4.x series, with features like customizable skins, tables which has been wanted for a long time. It has been heavily modularized: all tables are separated, developer-friendly input method engine interface, graphic user configuration tool. Also, 4.x series does not use GBK encoded Chinese configuration files anymore, and UTF-8 encoded English configuration files are used. I would like to highlight its perfect user experience of fcitx-sunpinyin, it is worthwhile for every Pinyin users to give a try.
There are still issues on its way of (probably) being the default of distros:
Properly speaking, Fcitx is still not a input method framework because of the reasons listed above, but it will be, also as said above.
Above are the most famous input methods, here is a list of other things in Linux input methods with short descriptions.
1. ucimf (Unicode Console Input Method Framework)
ucimf is an input method framework for Linux unicode framebuffer console, which is mainly with fbterm and jfberm. It is developed by Chinese developer, Mat. He maintains a series of input method engines ported from BSD licensed Mac OSX input method OpenVanilla.
There are other solutions under framebuffer console, for example ibus-fbterm (development has stopped), but I still recommend to use ucimf because it full featured and well maintained.
2. SunPinyin
One thing to clarify, Sunpinyin isn’t a frame work, but it is important so I would like to mention it here. We have mentioned scim-sunpinyin, ibus-sunpinyin and the recommended fcitx-sunpinyin. In fact there is also a standalone xsunpinyin alive. SunPinyin is a statistical language model based Chinese input method, which was firstly developed by Sun Beijing Globalization team, and opened source to community with Opensolaris project, with LGPLv2 and CDDL dual-licenses.
SunPinyin would be heavily used from now on, it is the best Pinyin engine on Linux and some other platforms.
3. SCIM2
SCIM2 was trying to be a next generation SCIM, but abandoned because of the emerging star during that period – IBus.
4. ImBus
ImBus is created by the author of SCIM, and he would like to make it a general input method framework including all known best techniques with minimal dependencies. But the project halted with the same reason like SCIM2, no code in its svn repository.
5. Fitx (Fun Input Toy for Linux)
Fitx was a flash in the pan on Linux, the project stopped soon after its emerging. Fitx is a ported version of FIT input method on Mac OSX, but implemented using SCIM framework.
6. gcin
gcin is a input method developed by traditional Chinese community, targeted to traditional Chinese users.
7. uim
uim is a input method framework made by Japanese developers. It is a little different because uim isn’t an input method server (XIM is a server), it’s just a library. Because the designer believe many people don’t need a full featured platform but only something enough to work.
For information about input method types, there is a good website: http://seba.studentenweb.org/thesis/im.php
Update:
2011-01-15 21:30
Thanks to Zhengpeng Hou, I’ve added some description about uim, and a notice about whether fcitx should be considered as a framework now.
2011-01-28 00:15
Thanks to Shawn P Huang, ibus-pinyin has been rewritten in mainly C++ with many improvements.
thank you for the introduction!
ucimf is interesting.
I wrote the gtk immodule handling code for AIR Linux, which was later reused/improved in Flash Player.
This is a really good introduction and survey. I really wish I had this around 3 years back when I was working on it.
SCIM and UIM were the two main input methods we were targeting back then.
As you said, instability was a major issue. I had spent hours writing workaround code to prevent crashers that were originating the input methods. We(Flash Platform team) already had a bad name for crashing in the wild and didn’t want to increase it further.
Sorry for getting off-topic, but you reminded me of some interesting debugging sessions :).
Thanks for all of your work, which have helped Linux users to use those applications. :-)
The stability still needs some more work, but there has already very very significant improvements.