I am trying to keep this article concise for you to make you have an outline of current condition of Linux (and maybe other platforms like BSDs) input methods. It’s coverage is mostly CJK languages, but I think other languages that use input method would be sure to find there examples in this article. We will start with the most popular ones, and there will be some hints about other ones at last.

Before we start our tour, there are two concepts to know, input method framework and input method engine:

  • An input method framework is designed to serve as a daemon and handle user input events, output the result to target applications or layers.
  • An input method engine is a program to analyze inputed characters and calculate a list of probably results, then send the results to their hosted input method framework to complete the reaction with users and applications.
  • 1. SCIM (Smart Common Input Method)

    Most Linux input method users may have the experience of using SCIM, which is created by Chinese developer Su Zhe for promoting his Intelligent Pinyin input method and providing a better input method framework.

    Some friends of mine are still keep using SCIM even though it is not being maintained, nor SKIM, its sister project on KDE. SCIM was the default choice of distros for years. People developed lots of input method engines for it, for example scim-pinyin, which has been mentioned above as Intelligent Pinyin input method. Users may be also familiar with scim-python, scim-xingma-*, scim-googlepinyin and the still-maintained scim-sunpinyin.

    On a distro maker’s point of view, the glorious age of SCIM has just finished.

    2. IBus (Intelligent Input Bus)

    IBus is the de facto standard of input method framework on nowadays Linux distros, whose author is a Chinese developer, Huang Peng, who has been mentioned as the author of scim-python and scim-xingma-*. IBus is aiming at a “next generation input framework” comparing to SCIM. I think this goal has been achieved – new comers may only know IBus from the very beginning of his adventure on Linux.

    IBus is written in C++, and is designed to be highly modularized: core input bus, gtk/qt interfaces, python binding, table engine, table modules and other input method engines. It uses Gtk immodule, thus is the best choice for GTK+ applications. What’s more, Flash Player support Gtk immodule only and IBus has no problem to work with it. The author of IBus is really helpful with other input engine developers, so there are many input method engines available on IBus framework.

    But IBus has obvious limitations from its design:

  • Firstly, it uses Gtk immodule only, which benefit GTK+ platform applications, but do poorly with QT.
  • Secondly, it depends on gconf, which is unacceptable for some users and distro makers (most of them are anti-gnome holic).
  • Thirdly, the most used input engine, ibus-pinyin is written in Python that caused serious performance limitation. And this engine has had some severe bugs like memory leak and dead loop (100%). Even though the condition is largely improved, users are still complaining about them.
  • Fourthly, the alternative Pinyin engine ibus-sunpinyin is not well maintained, and really lacks of testing. There are some obvious bug leaving their without people interested to fix.
  • Note: ibus-pinyin has been rewritten in mainly C++ with many improvements, and such changes will land on major in very near future (maybe some of them have already published it, I didn’t do a detailed research here).

    3. Fcitx (Free Chinese Input Toy for X)

    Fcitx is an old and new input method. It bears at the same time as SCIM, and now gets a new life with the brand new 4.x series. As name suggests, it is first designed to be a Chinese specific input method by Yuking. During 3.x series, the aim of being a feature rich Chinese input method gave it quite a few of fans, but also kept it from being a default choice of major distros. Fcitx uses XIM, which works well for most platforms (like GTK+ and QT), but has some small problems.

    However, starting from 4.x series, Fcitx has been given a new goal with a new maintainer – a college student at Peking University, Weng Xuetian. Now he has published 4.0.1, the second version of 4.x series, with features like customizable skins, tables which has been wanted for a long time. It has been heavily modularized: all tables are separated, developer-friendly input method engine interface, graphic user configuration tool. Also, 4.x series does not use GBK encoded Chinese configuration files anymore, and UTF-8 encoded English configuration files are used. I would like to highlight its perfect user experience of fcitx-sunpinyin, it is worthwhile for every Pinyin users to give a try.

    There are still issues on its way of (probably) being the default of distros:

  • Though the author has promised Gtk immodule support in 4.1.0 release, the feature is still not available now.
  • The internal Pinyin input method is old, and still not being separate out from the framework core because of too close integration before. The work will be done in 4.1.0 as well.
  • Fcitx is still lacking of people who are interested in developing input method engines, even if the interface is more friendly to developers. There is an example, fcitx-sunpinyin (written in C++) has only ~300 lines to make everything work perfectly with libsunpinyin.
  • Properly speaking, Fcitx is still not a input method framework because of the reasons listed above, but it will be, also as said above.

    Above are the most famous input methods, here is a list of other things in Linux input methods with short descriptions.

    1. ucimf (Unicode Console Input Method Framework)

    ucimf is an input method framework for Linux unicode framebuffer console, which is mainly with fbterm and jfberm. It is developed by Chinese developer, Mat. He maintains a series of input method engines ported from BSD licensed Mac OSX input method OpenVanilla.

    There are other solutions under framebuffer console, for example ibus-fbterm (development has stopped), but I still recommend to use ucimf because it full featured and well maintained.

    2. SunPinyin

    One thing to clarify, Sunpinyin isn’t a frame work, but it is important so I would like to mention it here. We have mentioned scim-sunpinyin, ibus-sunpinyin and the recommended fcitx-sunpinyin. In fact there is also a standalone xsunpinyin alive. SunPinyin is a statistical language model based Chinese input method, which was firstly developed by Sun Beijing Globalization team, and opened source to community with Opensolaris project, with LGPLv2 and CDDL dual-licenses.

    SunPinyin would be heavily used from now on, it is the best Pinyin engine on Linux and some other platforms.

    3. SCIM2
    SCIM2 was trying to be a next generation SCIM, but abandoned because of the emerging star during that period – IBus.

    4. ImBus

    ImBus is created by the author of SCIM, and he would like to make it a general input method framework including all known best techniques with minimal dependencies. But the project halted with the same reason like SCIM2, no code in its svn repository.

    5. Fitx (Fun Input Toy for Linux)

    Fitx was a flash in the pan on Linux, the project stopped soon after its emerging. Fitx is a ported version of FIT input method on Mac OSX, but implemented using SCIM framework.

    6. gcin
    gcin is a input method developed by traditional Chinese community, targeted to traditional Chinese users.

    7. uim

    uim is a input method framework made by Japanese developers. It is a little different because uim isn’t an input method server (XIM is a server), it’s just a library. Because the designer believe many people don’t need a full featured platform but only something enough to work.

    For information about input method types, there is a good website: http://seba.studentenweb.org/thesis/im.php

    Update:
    2011-01-15 21:30
    Thanks to Zhengpeng Hou, I’ve added some description about uim, and a notice about whether fcitx should be considered as a framework now.
    2011-01-28 00:15
    Thanks to Shawn P Huang, ibus-pinyin has been rewritten in mainly C++ with many improvements.

    原文发表于 Ubuntu 中文邮件列表,欢迎大家到邮件列表参与讨论。

    这两天讨论了一些拼音输入法的话题,我做个小小的总结,欢迎讨论。

    眼下 Ubuntu 默认的输入平台是 ibus,随光盘发布 ibus-pinyin,默认的五笔输入法是 ibus-table-wubi,繁体中文默认输入法是 ibus-chewing。

    我只会用拼音,五笔和酷音的情况不了解。下面我简单总结下我所知的几个常见输入法的情况,也看看各位认为将来 Ubuntu 默认使用哪个更好。

    1. IBus

    ibus 平台目前是各大发行版的标配,框架本身用 C++ 写成,模块化程度非常高,有很多可选的输入法。作者在继续开发,对输入法开发者也比较热情。

    ibus-pinyin 最初是 python 写成的,这些版本的效率略逊一筹,其中还出现过一些 CPU 100% 和内存泄露的 bug。ibus-pinyin 后来已经主要用 C++ 重写并做了非常多的改进,但是因为发行版的原因迟迟没有进入到仓库,接近纯 C++ 版本的 ibus-pinyin 会出现在 11.04 中。

    ibus 的拼音还可以选择 ibus-sunpinyin,但是使用的人比较少,反馈也比较少。

    ibus 是使用 gtk immodule的,这使得它在 GTK 程序里表现非常出色,可以在 flash 里输入中文,但是在 QT 程序上表现一般。可以说它主要是 GTK 的输入平台。

    2. Fcitx

    fcitx 是老牌的 Linux 中文输入法,有一些粉丝,不过很多人对它的印象还是基于 GBK 中文配置文件的 3.x。新的 fcitx 4 已经使用英文 utf8 配置文件,支持自定义皮肤和码表,改进了输入法接口,新增了图形化配置工具。Bug 也不比 ibus 多。

    fcitx 的最佳拼音方案是 fcitx-sunpinyin。上词准确度和所有 sunpinyin 核心的输入法相同。相比于 ibus-sunpinyin 和 scim-sunpinyin,它的优势在于可以利用 fcitx 本身的各种功能(比如皮肤),流畅性和fcitx 内置拼音输入法没什么差别。

    fcitx 的问题有以下几个:1. 内置的 pinyin 输入法还没能独立成为模块,且它的算法也已经落后;2. 虽然给 fcitx 写输入法已经比 ibus 更简单,却仍然缺乏关注,可用的输入法比 ibus 少一些。

    fcitx 使用 XIM,更接近是一个 X 的输入平台。但是 flash 不支持 XIM,某些光标跟随也有点小毛病。4.1 版本会有 gtk immodule 支持,上述问题会一并解决。

    3. Scim

    scim 平台是曾经各发行版的标配,scim-pinyin 的输入流畅性也始终好于 ibus-pinyin,相信有一些人仍然坚守在 scim 的阵地上。然而 scim 和它在 kde 上的 skim 都没有人在进行维护,Debian/Ubuntu 也只是打包人员偶尔修复几个简单的 bug。我们不可能逆行回去再使用它。

    scim 的拼音输入法有 scim-pinyin,scim-python,scim-googlepinyin 和 scim-sunpinyin,前三者目前都没有人维护。推广 scim-pinyin 的智能拼音输入法是作者开发 SCIM 的原因;scim-python 是 ibus-pinyin 的前身;scim-googlepinyin 是用 android 上输入法的算法写成的。

    4. Yong

    yong 是最近曝光率有点高的小小输入法,作者说是为了推广他的永码而开发。我没有使用过它,仅从配置文件看猜它是同 fcitx 一样使用了XIM,因此也会受 XIM 的各种问题影响。yong 是闭源软件,不论是许可证原因,还是平台移植性原因,都不可能被主流发行版作为默认输入法。当然,给用户多一个选择总是好事。

    Update:
    2011-01-28 00:15
    感谢 Shawn P Huang,Shellexy 和 BYBird,更新了 ibus-pinyin 的部分,说明新版已经用 C++ 重写,并已经改进了很多地方。

    Recently I read Alexandre Rosenfeld’s blog post about low performance of btrfs while placing VM’s disk on it, so I had a try and got almost the same result.

    I tried to have someone look into the problem, and now get the answer, quoting Bastian Blank :

    This is a result of the filesystem design, no bug. For decent performance don’t use O_SYNC on files. For qemu use cache=writeback in the disk definition.

    I have to say busybox is so cool!

    Today I did an installation of Debian Squeeze using daily netinst ISO image in an i686 qemu-kvm virtual machine. I was using ext4 for /boot, and btrfs for /. The installation process was quite normal as expected.

    After the installation I rebooted the virtual machine. Grub2 loaded correctly, but I was dropped into busybox ash prompt next to that… Reporting:

    FATAL: Error inserting btrfs (/lib/modules/2.6.32-5-686)/kernel/fs/btrfs/btrfs.ko) unknown symbol in module, or unknown parameter (see dmesg)

    So I go to #debian and #debian-devel to ask for help, lindi- asked me for dmesg output and told me to “just configure networking in initramfs and use busybox netcat”.

    The actual procedure is:
    1. Of course save dmesg output to file first:
    (busybox) dmesg > dmesg.txt

    2. Run ipconfig in busybox to configure the network (I was using qemu-kvm as superuser, so no need to deal with user mode networking problem):
    (busybox) ipconfig eth0

    3. Run netcat on host machine to listen a port, e.g. 3333:
    $ nc -l 3333

    4. Send the file:
    (busybox) cat dmesg.txt | nc 192.168.100.1 3333

    And I finally got the dmesg.txt lying on my host machine. Well, I still haven’t get the virtual machine working till now. :-(

    Update 2010-1-4 :
    This bug has been reported as Debian Bug #608538.
    Quoting Joey Hess:

    I hope this can be dealt with, it seems to be the only remaining issue in getting Debian to support btrfs root filesystems.

    It appears to be an issue about btrfs module need crc32 module, but crc32 isn’t loaded automatically.

    Here are speakers presentations we’ve used on 16th Oct, 2010’s Beijing Ubuntu 10.10 Release Party, and licensed under CC by-nc-sa 3.0.

    1. Live USB introduce – 白清杰 [linuxbqj AT gmail DOT com] (Chinese)

    2. You in Community – Aron Xu [happyaron AT ubuntu DOT com] (English)

    3. How Applications Speak Fluently – Eleanor Chen [chenyueg AT ubuntu DOT com] (English)

    Update: I’ve updated more details about the party.

    Hello everybody,

    We are happy to announce that we’ll hold Beijing Ubuntu 10.10 Release Party on 16th Oct, 2010.

    Time: 2010-10-16 14:00-17:50
    Location: Building 3, BUAA, No. 37 Xueyuan Road, Haidian District, Beijing, P.R.China (Google Maps)

    Activities:
    14:00-15:00 Presentations
    14:00-14:20 Qingjie Bai (From Kanas FOSS Store) Howto: LiveUSB Making and Usage
    14:20-14:40 Aron Xu (Ubuntu Simplified Chinese Translation Team Leader, Ubuntu Member, Organizer): You in Community – The Foss Essentials
    14:40-15:00 Eleanor Chen (Ubuntu China LoCo Contact, Ubuntu Member, Organizer): How
    Applications Speak Fluently – Introduction to FOSS I18n and L10N

    15:00-17:50 Installfest and discussions (so take your laptop with you!)

    There is no need for registration or invitation, feel free to join us and have fun!

    Chromium trunk build doesn’t automatically update the latest version like Minefiled (Firefox trunk), some Ubuntu users have chosen chromium-daily PPA for doing so, but I think updating a bleeding edge daily package will make lots of noise, so I wrote this script.

    I am a big fan of Firefox, but I use Chromium in some cases that Firefox cannot handle very well. The script will maintain a directory in good shape so that you can run your program properly, even when you are running chromium and the script won’t crash you by the update. But never tell me you don’t close it for a long long long time that this script has ran for times and your browser crashed, just restart will make it okay.

    Put it in your ~/usr/chromium/ and setup cron to run it every day or any time you would like to, it will update your chromium from http://build.chromium.org to your user’s directory. Be aware this program is run in a common user’s account, and by default it gets the AMD64 version.

    If you wish to change to I386, find and replace “chromium-rel-linux-64″ with “chromium-rel-linux”.

    Note: replace any “&lt” or “&gt” (and following ; mark)you find in your script with “”, I am not able to to make them show up correctly in this page. :-(

    #!/bin/sh
    # Copyright (C) 2010 Aron Xu <happyaron.xu@gmail.com>
    #
    # This program is free software: you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program.  If not, see .
     
    LATEST=`wget -q -O - 'http://build.chromium.org/buildbot/snapshots/chromium-rel-linux-64/LATEST' | awk 'NF > 0'`
     
    cd $HOME/usr/chromium/
     
    if [ -f ./CURRENT ]; then
    	CURRENT=`cat CURRENT`
    	if [ $LATEST -le $CURRENT ]; then
    		exit;
    	fi
    fi
     
    if [ -f ./OLD ]; then
    	OLD=`cat OLD`;
    fi
     
    if [ -d ./update ]; then
    	rm -rf ./update/* && cd ./update/;
    else
    	rm -rf ./update && mkdir ./update && cd ./update/;
    fi
     
    wget -q http://build.chromium.org/buildbot/snapshots/chromium-rel-linux-64/$LATEST/chrome-linux.zip
    unzip chrome-linux.zip
    rm -f chrome-linux.zip
     
    mkdir ../$LATEST
    mv ./chrome-linux/* ../$LATEST/
    rm -rf ./*
    cd ../
     
    rm -f chromium-browser
    ln -s $LATEST chromium-browser
     
    rm -rf $OLD
    mv CURRENT OLD
    echo $LATEST > CURRENT

    It is very easy to change this shell script to grab a specific build of chromium if you want to have “bug free” version (I mean build bot doesn’t complain).

    #!/bin/sh
    # Copyright (C) 2010 Aron Xu <happyaron.xu@gmail.com>
    #
    # This program is free software: you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program.  If not, see .
     
    LATEST=$1
     
    cd $HOME/usr/chromium/
     
    if [ -f ./CURRENT ]; then
    	CURRENT=`cat CURRENT`
    	if [ $LATEST -le $CURRENT ]; then
    		exit;
    	fi
    fi
     
    if [ -f ./OLD ]; then
    	OLD=`cat OLD`;
    fi
     
    if [ -d ./update ]; then
    	rm -rf ./update/* && cd ./update/;
    else
    	rm -rf ./update && mkdir ./update && cd ./update/;
    fi
     
    wget -q http://build.chromium.org/buildbot/snapshots/chromium-rel-linux-64/$LATEST/chrome-linux.zip
    unzip chrome-linux.zip
    rm -f chrome-linux.zip
     
    mkdir ../$LATEST
    mv ./chrome-linux/* ../$LATEST/
    rm -rf ./*
    cd ../
     
    rm -f chromium-browser
    ln -s $LATEST chromium-browser
     
    rm -rf $OLD
    mv CURRENT OLD
    echo $LATEST > CURRENT
    echo $LATEST > GRAB

    Again, put it in ~/usr/chromium/ and run:
    sh grab-chromium.sh 59353
    (59353 is a build of chromium that no build bot bugs show up, you can change it to any version you’d like to grab).

    If you like to run chromium with a long list of parameters, just create another script. Mine is here:

    #!/bin/sh
    $HOME/usr/chromium/chromium-browser/chrome %U --disk-cache-dir="/dev/shm/browser.`whoami`.cache/chromium" --disk-cache-size=52428800

    You can add it to your desktop’s menu.

    It’s fairly easy:
    $ man -t bash | ps2pdf - bash.pdf

    “man -t” uses groff -mandoc to format the manual page to stdout.

    “ps2pdf – bash.pdf” means the input is from stdin and output to bash.pdf.

    We use a simple pipe to join the stdout of “man” and stdin of “ps2pdf”.

    That’s it!

    Unzip 5.x has an option -O to specific the encoding of file names in an ZIP archive, but when 6.0 is arriving with unicode support, that option disappeared as well. CJK users need special cares on support and conversion of obsolete encoding while they are switching to utf-8.

    Here is my workaround about this problem, install p7zip and convmv packages on your system first, then:
    $ env LC_ALL=C 7z x file.zip
    $ convmv -f gbk -t utf8 --notest *

    File names extracted by unzip are not able to be converted to correct one whatever you do with it, but what is done by 7z can be converted by convmv.

    Moving more on, we can automate this action to a script:
    #! /bin/sh
    LANG=C /usr/bin/7z x -y "$1" | sed -n 's/^Extracting //p' | sed '1!G;h;$!d' | xargs convmv -f gbk -t utf8 --notest >/dev/null 2>/dev/null

    Save it us unzip.sh, then try:
    $ sh unzip.sh file.zip
    This will act as what unzip does, but with additional care about converting file name encoding from gbk to utf-8. Moreover, convmv can detect whether your file name is already utf-8 encoded and will skip it.

    If your file names are encoded other encoding, please replace “gbk” with the appropriate name.

    This is just another article about how to get your emails managed in a graceful way.

    Here is my former way of dealing with them: I choose Gmail as my primary email service provider, and use the online Gmail web page for managing most of the emails – view and move to trash. As for emails need my reply or participate, I use Thunderbird 3 + Enigmail with Gmail’s IMAP support, so I can GPG sign my outbound emails and verify signatures of inbound ones from other people. I keep emails on Gmail server and have a backup copy on my PC from Thunderbird’s IMAP sync.

    Yes, everything looked good in the days I have only a not-too-much amount of emails and I was happy with all above worked well for a long period. But a bad thing appears while my mailbox size is growing quite fast in the near half year: Thunderbird’s speed is lower and lower, the local Mbox file is larger day by day. Personally I don’t think Mbox format with big amount of emails stored in a single file is so reliable because when there is something wrong, everything soon follow.

    My initial thought of turning to another way is, I need to find a better way for managing large amount of growing emails, and three overall requirements are listed below:

  • 1. A not-so-rare solution. I am not familiar with how email system runs so a popular solution can help me find essential documents when I run into trouble.
  • 2. Good stability. Stability is always a key topic when people are finding a solution for their deployment. Even though I am just a desktop user who are looking for a personal way of dealing things, I’d like to have a stable platform to make my life easier.
  • 3. A flexible (and maybe very customizable) way. Flexibility is an important thing when you use a *NIX platform or even all the time; customizability is another great thing once you are willing to pay your hours on making everything just fits your special taste.
  • Next, search the Internet and we can surely found Mutt should be counted in our list of choices, and there are several different usages. Before comparing among them, I need to say something about the email system first.

    There are three key elements in an email system: MUA (Mail User Agent), MTA (Mail Transfer Agent) and MDA (Mail Delivery Agent):

  • 1. An MUA is always the client software that you will face directly everyday – read, write and manage your messages with it.
  • 2. An MTA is a big concept that includes most part of the sending and receiving emails among servers, including but not limited to the services that we may know: SMTP, IMAP, POP3, etc.
  • 3. An MDA is a program deal with user’s mail delivery to their specific mail spool on a particular system; its work are usually processing emails received by MTA and figuring out whether the email is a suspected spam and drop the email to the user specified place of receiving her/his mails.
  • When we are receiving and sending emails on our local computer (not the web based way since it is actually handled by remote server), we need all of the three parts. We get emails from remote server’s inbox with MTA; filter and deliver the emails to our email spool with MDA; read, write and manage emails with MUA; and send emails with a sending MTA. As for the local storage format, there is Mbox that I don’t like, and Maildir I am interested in. More about storage format will be described in following paragraphs.

    There are many alternative all-in-one solutions like Thunderbird, Evolution, KMail and so on, they combine the functions which are required to manage emails for an end-user and hide them behind. Now I am not finding another all-in-one application so I need to choose every program to take their roles in the route of receiving and sending emails. Since I have decided to use Mutt as my MUA from beginning, I don’t need to be bothered by this topic, again.

    First I need a receiving MTA for fetching emails from Gmail server. There are two popular choices: Fetchmail and Getmail4.

  • 1. Fetchmail is a full-featured, robust, well-documented remote-mail retrieval and forwarding utility written in the C programing language. It is famous of the large user base and Eric S. Raymond was maintaining it from 1996 to 2003.
  • 2. Getmail4 is the 4th version of Getmail – which is designed to get rid of the shortcomings in fetchmail. The program is written in Python.
  • Both of the two supports both IMAP and POP3 protocol that are my choices of receiving emails. I prefer to use IMAP because the protocol is designed to have the functions POP3 has and many other features including sync between server and clients. I don’t really need to have mails synced because the local copy is mostly a backup, and I can do it through web interface or Thunderbird if I really need them to be synced. Setting up an IMAP synced local mail solution is kind of off topic from this article, and you may want to try imapsync if you are sure you really need. Another reason of my choosing IMAP over POP3 is, Gmail’s POP3 implementation has violated the commonly expected behavior and there are much more limitations than IMAP (you can fetch around 200-500 mails per session in POP3, and probably be locked 24 hours if you access it several times in a short period, but when you are a newbie and trying to test your configuration you may exceed the limits).

    Now try them out to find which one is better for me. The configuration files always have your email account and password, so make it only readable by yourself using:
    chmod 0600 /path/to/file
    Here are the configurations for both the programs, username and password are changed to “user” and “passwd”:

  • 1. $HOME/.fetchmailrc:
    #set daemon 600
    #set syslog
    defaults
    poll "imap.gmail.com" proto IMAP
    user 'user' password 'passwd'
    mda "/usr/bin/procmail"
    keep ssl
    sslcertck
    sslcertpath /etc/ssl/certs
    fetchall
    folder "[Gmail]/All Mail"

    The commented lines are telling fetchmail to work as a daemon and try to fetch emails from server every 600 seconds. Option “defaults” tells fetchmail load its default settings. The following two lines describes the server, protocol, username, password; port and local mapped user can be set as well, but I don’t need it. Next line started by “mda” tells fetchmail it should relay received emails to the specific MDA, now we set it to “/usr/bin/procmail” and will be documented in following paragraphs. “keep” means fetchmail should keep the copy on server after getting the mails, this option is not very useful for Gmail because there are preferences on the web interface that you can choose how the server deal with the delete request from clients. “ssl” stands for use ssl, “sslcertck” for checking whether the cert of server has a valid signature by a CA, and “sslcertpath” sets the path of CA cert’s and we use “/etc/ssl/certs”. “fetchall” tells that fetchmail should fetch all emails on the server rather than only new ones. “folder” tells the exact folder fetchmail should go and check, we set it to “[Gmail]/All Mail”.
  • 2. $HOME/.getmail/getmailrc:
    [retriever]
    type = SimpleIMAPSSLRetriever
    server = imap.gmail.com
    username = user
    password = passwd
    mailboxes = ("[Gmail]/All Mail",)

    [destination]
    type = MDA_external
    path = /usr/bin/procmail

    [options]
    delete = false
    message_log = ~/.mail/getmail.log
    message_log_syslog = false
    read_all = false
    verbose = 2
    delivered_to = false
    received = false

    Briefly, there are three sections in Getmail4’s configuration file: [retriever] (defines which kind of protocol to use and options related), [destination] (defines where to deliver or pass the emails) and [options] (other options for Getmail4).
    I use “SimpleIMAPSSLRetriever” type of retriever and its name suggests that it is used to common SSL enabled IMAP protocol, which meets our needs. Other fields in this section are easy to understand, the last one defines which folder the program should check, and don’t forget the last comma (,) if you only have one mailbox to be listed there. For destination, I was planning (finally I changed my mind, read on) to use procmail so I have a “MDA_external” type destination, and point the path to “/usr/bin/procmail”. In the last section, “delete” defines whether mails should be deleted from remote server after being received, as said in Fetchmail configuration’s explanation, it is not really necessary for Gmail users, “message_log” defines where the log of retrieving every message should go and “message_log_syslog” is the switch whether message retrieve info should be logged to syslog. “read_all” should not be set to true because it is saying that ignoring what have been fetched and get everything every time Getmail4 runs. “verbose” controls log verbosity: if set to 2, print messages about each of its actions; if set to 1, print messages about retrieving and deleting messages (only); if set to 0, ll only print warnings and errors. “delivered_to” and “received” tells Getmail4 whether it should add those two fields to retrieved emails header, personally like them to be set to “no”.

  • Before we start our test, we need to setup our MDA – Procmail. It is the time for me to tell you why I prefer Maildir local storage format to Mbox used by Thunderbird. In Maildir, every email is stored in a single file, and a real folder contains all the files in a “folder” on remote server. The email files are in plain text. There are three most important benefits from Maildir:

  • 1. No need to lock the folder as in Mbox, Maildir stores emails in single plain text files that can accessed by multiple programs at the same time and can be easily maintained by scripts.
  • 2. The maintenance of storage now depends on your file system rather than an email client. Nowadays file systems on *NIX systems like Ext3/4, Reiserfs, Btrfs are really stronger than ever before and is more reliable than a client program, because a file system is designed to maintain files but maintenance of email storage is only one of the functions of a rather smaller project comparing with file system designing. For the worst situation our data get damaged while the mail program is processing things, we get failed for some of the messages in Maildir, but the whole folder would be corrupted in Mbox.
  • 3. Mailboxes in Maildir format can be used through a network file system (like NFS), but Mbox cannot.
  • And here are disadvantages:

  • 1. Maildir is not supported by many client software while Mbox is universally supported.
  • 2. Some filesystems may not efficiently handle a large number of small files (like XFS).
  • 3. Searching text is not as fast as Mbox. If we want to speed up the search process, a helper program with cache is needed.
  • As I can search the text of email within web interface of Gmail, I don’t need to care too much about the searching disadvantages. I am using Ext4 and is strong enough to handle thousands of small files in one directory. At last I am choosing Mutt which supports Maildir very well. I’d like to take the advantages of it now.

    Here is my configuration of Procmail, $HOME/.procmailrc:
    VERBOSE=off
    DEFAULT=$HOME/.mail/inbox/
    MAILDIR=$HOME/.mail/
    LOGFILE=$HOME/.mail/procmail.log

    Don’t forget the slash after “$HOME/.mail/inbox/”, if you lose it Procmail will use Mbox, and you if add it, Maildir instead.

    So we can kick off our test now, keep in mind that neither Fethmail nor Getmail4 nor Procmail need root privilege, just run them in your account:
    $ fetchmail -v
    $ getmail
    I have to admit my prediction of result is totally wrong. I thought Fetchmail is used by many many people and is written in the efficient C, Getmail4 only has a smaller user base and is written in Python which may take more resource on many cases. But the result tells me, under the current configuration, neither of them work for me: Fetchmail fails to fetch around 1/10 of my attachments and only get 0.x KB for a 5MB+ email; Getmail4 stuck when fetching mails larger than 5MB.

    What a hell! But I am not stopping because there is another way – Getmail4 is designed to have some MDA functions built-in, so it can deliver messages directly to Maildir or Mbox format for user. It is time to say I like Gmail’s excellent spam filtering feature so that I don’t need to pay so much time on setting up a spam filter with Procmail or Maildrop, and a simple delivery is okay. Now I change the [destination] section of $HOME/.getmail/getmailrc to:
    [destination]
    type = Maildir
    path = ~/.mail/inbox/

    and run again:
    $ getmail
    Great this time and all my emails are retrieved successfully after a long time’s wait (just leave it here and move on other stuff).

    After choosing a suitable receiving MTA (my choice is with MDA built-in), I still need a sending MTA. There are several popular choices, for light-weight ones: msmtp, esmtp; for powerful ones: exim4, postfix and qmail. The last three ones are run as root daemon, designed to be full replacement of the traditional sendmail. Usually we don’t need such big things for a daily purpose, and they are really worth considering if you would like to run a *real* MTA that can exchange emails with other servers.
    Either msmtp or esmtp is designed to work as an agent to forward local email to a real MTA server supports SMTP protocol. Currently msmtp is more welcomed, but the feature list is shorter than esmtp. After a detailed check, that esmtp is not maintained now, so I choose msmtp. Here is my configuration, $HOME/.msmtprc:
    defaults
    tls on
    tls_starttls on
    tls_certcheck on
    tls_trust_file /etc/ssl/certs/ca-certificates.crt
    logfile ~/.mail/msmtp.log

    account default
    host smtp.gmail.com
    port 587
    from user@gmail.com
    auth on
    user user
    password passwd

    There is your username and password in this file, so follow the instruction before to change this file to 0600 mode. “tls”, “tls_starttls” and “tls_certcheck” tells msmtp to use STARTTLS for encryption, and check for validation of the cert.

    Finally, I go to the key part – Mutt. Here is some essential lines from my $HOME/.muttrc:
    ignore *
    unignore From Subject Lines
    hdr_order From Subject Lines

    set index_format="%[%b-%d] %?X?%X& ? %-2e %-18.18L [%4c] %s"
    set status_on_top=yes

    set editor="vim -c 'norm O'"

    set sendmail="/usr/bin/msmtp"
    set sendmail_wait = 5

    set mbox_type=Maildir
    set folder="~/.mail"
    set mask="!^\\.[^.]"
    set mbox="+inbox"
    set record="+inbox"
    set postponed="+inbox"
    set spoolfile="~/.mail/inbox/"
    set trash="~/.mail/trash/"
    set maildir_trash=no

    set quit=yes
    set move=no
    set beep_new=yes
    set check_new=yes
    set recall=no
    set resolve=yes
    set allow_8bit
    set charset="utf-8"
    set rfc2047_parameters=yes

    set include=yes
    set indent_str="> "
    set mime_forward
    set mime_forward_rest
    set fast_reply
    unset metoo
    unset reply_self
    set reply_regexp="^(re([\[0-9\]+])*|aw|回复)(:[ \t]|:)"
    set quote_regexp="^( {0,4}-?[>|:]| {0,4}[a-z0-9]+[>|]+)+"

    set from='Name Last '
    set use_from
    set envelope_from=yes
    set realname='First Last'

    bind index gg first-entry
    bind index G last-entry
    bind index \cf next-page
    bind index \cb previous-page
    bind index ,g group-reply
    bind pager j next-line
    bind pager k previous-line
    bind pager previous-line
    bind pager next-line
    bind pager gg top
    bind pager G bottom

    color hdrdefault black default
    color quoted red default
    color signature brightblack default
    color indicator brightwhite red
    color attachment black default
    color error red default
    color message blue default
    color search brightwhite magenta
    color status brightyellow blue
    color tree red default
    color normal blue default
    color tilde green default
    color bold brightyellow default
    color markers red default

    Thanks to Roy L Zuo (roylzuo at gmail dot com) for great aid! There are much more lines in my mutt configuration, and the colors are suitable for white background.

    In conclusion, here are my choice:
    Mutt + Getmail4 + Msmtp