Unzip 5.x has an option -O to specific the encoding of file names in an ZIP archive, but when 6.0 is arriving with unicode support, that option disappeared as well. CJK users need special cares on support and conversion of obsolete encoding while they are switching to utf-8.

Here is my workaround about this problem, install p7zip and convmv packages on your system first, then:
$ env LC_ALL=C 7z x file.zip
$ convmv -f gbk -t utf8 --notest *

File names extracted by unzip are not able to be converted to correct one whatever you do with it, but what is done by 7z can be converted by convmv.

Moving more on, we can automate this action to a script:
#! /bin/sh
LANG=C /usr/bin/7z x -y "$1" | sed -n 's/^Extracting //p' | sed '1!G;h;$!d' | xargs convmv -f gbk -t utf8 --notest >/dev/null 2>/dev/null

Save it us unzip.sh, then try:
$ sh unzip.sh file.zip
This will act as what unzip does, but with additional care about converting file name encoding from gbk to utf-8. Moreover, convmv can detect whether your file name is already utf-8 encoded and will skip it.

If your file names are encoded other encoding, please replace “gbk” with the appropriate name.

This is just another article about how to get your emails managed in a graceful way.

Here is my former way of dealing with them: I choose Gmail as my primary email service provider, and use the online Gmail web page for managing most of the emails – view and move to trash. As for emails need my reply or participate, I use Thunderbird 3 + Enigmail with Gmail’s IMAP support, so I can GPG sign my outbound emails and verify signatures of inbound ones from other people. I keep emails on Gmail server and have a backup copy on my PC from Thunderbird’s IMAP sync.

Yes, everything looked good in the days I have only a not-too-much amount of emails and I was happy with all above worked well for a long period. But a bad thing appears while my mailbox size is growing quite fast in the near half year: Thunderbird’s speed is lower and lower, the local Mbox file is larger day by day. Personally I don’t think Mbox format with big amount of emails stored in a single file is so reliable because when there is something wrong, everything soon follow.

My initial thought of turning to another way is, I need to find a better way for managing large amount of growing emails, and three overall requirements are listed below:

  • 1. A not-so-rare solution. I am not familiar with how email system runs so a popular solution can help me find essential documents when I run into trouble.
  • 2. Good stability. Stability is always a key topic when people are finding a solution for their deployment. Even though I am just a desktop user who are looking for a personal way of dealing things, I’d like to have a stable platform to make my life easier.
  • 3. A flexible (and maybe very customizable) way. Flexibility is an important thing when you use a *NIX platform or even all the time; customizability is another great thing once you are willing to pay your hours on making everything just fits your special taste.
  • Next, search the Internet and we can surely found Mutt should be counted in our list of choices, and there are several different usages. Before comparing among them, I need to say something about the email system first.

    There are three key elements in an email system: MUA (Mail User Agent), MTA (Mail Transfer Agent) and MDA (Mail Delivery Agent):

  • 1. An MUA is always the client software that you will face directly everyday – read, write and manage your messages with it.
  • 2. An MTA is a big concept that includes most part of the sending and receiving emails among servers, including but not limited to the services that we may know: SMTP, IMAP, POP3, etc.
  • 3. An MDA is a program deal with user’s mail delivery to their specific mail spool on a particular system; its work are usually processing emails received by MTA and figuring out whether the email is a suspected spam and drop the email to the user specified place of receiving her/his mails.
  • When we are receiving and sending emails on our local computer (not the web based way since it is actually handled by remote server), we need all of the three parts. We get emails from remote server’s inbox with MTA; filter and deliver the emails to our email spool with MDA; read, write and manage emails with MUA; and send emails with a sending MTA. As for the local storage format, there is Mbox that I don’t like, and Maildir I am interested in. More about storage format will be described in following paragraphs.

    There are many alternative all-in-one solutions like Thunderbird, Evolution, KMail and so on, they combine the functions which are required to manage emails for an end-user and hide them behind. Now I am not finding another all-in-one application so I need to choose every program to take their roles in the route of receiving and sending emails. Since I have decided to use Mutt as my MUA from beginning, I don’t need to be bothered by this topic, again.

    First I need a receiving MTA for fetching emails from Gmail server. There are two popular choices: Fetchmail and Getmail4.

  • 1. Fetchmail is a full-featured, robust, well-documented remote-mail retrieval and forwarding utility written in the C programing language. It is famous of the large user base and Eric S. Raymond was maintaining it from 1996 to 2003.
  • 2. Getmail4 is the 4th version of Getmail – which is designed to get rid of the shortcomings in fetchmail. The program is written in Python.
  • Both of the two supports both IMAP and POP3 protocol that are my choices of receiving emails. I prefer to use IMAP because the protocol is designed to have the functions POP3 has and many other features including sync between server and clients. I don’t really need to have mails synced because the local copy is mostly a backup, and I can do it through web interface or Thunderbird if I really need them to be synced. Setting up an IMAP synced local mail solution is kind of off topic from this article, and you may want to try imapsync if you are sure you really need. Another reason of my choosing IMAP over POP3 is, Gmail’s POP3 implementation has violated the commonly expected behavior and there are much more limitations than IMAP (you can fetch around 200-500 mails per session in POP3, and probably be locked 24 hours if you access it several times in a short period, but when you are a newbie and trying to test your configuration you may exceed the limits).

    Now try them out to find which one is better for me. The configuration files always have your email account and password, so make it only readable by yourself using:
    chmod 0600 /path/to/file
    Here are the configurations for both the programs, username and password are changed to “user” and “passwd”:

  • 1. $HOME/.fetchmailrc:
    #set daemon 600
    #set syslog
    defaults
    poll "imap.gmail.com" proto IMAP
    user 'user' password 'passwd'
    mda "/usr/bin/procmail"
    keep ssl
    sslcertck
    sslcertpath /etc/ssl/certs
    fetchall
    folder "[Gmail]/All Mail"

    The commented lines are telling fetchmail to work as a daemon and try to fetch emails from server every 600 seconds. Option “defaults” tells fetchmail load its default settings. The following two lines describes the server, protocol, username, password; port and local mapped user can be set as well, but I don’t need it. Next line started by “mda” tells fetchmail it should relay received emails to the specific MDA, now we set it to “/usr/bin/procmail” and will be documented in following paragraphs. “keep” means fetchmail should keep the copy on server after getting the mails, this option is not very useful for Gmail because there are preferences on the web interface that you can choose how the server deal with the delete request from clients. “ssl” stands for use ssl, “sslcertck” for checking whether the cert of server has a valid signature by a CA, and “sslcertpath” sets the path of CA cert’s and we use “/etc/ssl/certs”. “fetchall” tells that fetchmail should fetch all emails on the server rather than only new ones. “folder” tells the exact folder fetchmail should go and check, we set it to “[Gmail]/All Mail”.
  • 2. $HOME/.getmail/getmailrc:
    [retriever]
    type = SimpleIMAPSSLRetriever
    server = imap.gmail.com
    username = user
    password = passwd
    mailboxes = ("[Gmail]/All Mail",)

    [destination]
    type = MDA_external
    path = /usr/bin/procmail

    [options]
    delete = false
    message_log = ~/.mail/getmail.log
    message_log_syslog = false
    read_all = false
    verbose = 2
    delivered_to = false
    received = false

    Briefly, there are three sections in Getmail4’s configuration file: [retriever] (defines which kind of protocol to use and options related), [destination] (defines where to deliver or pass the emails) and [options] (other options for Getmail4).
    I use “SimpleIMAPSSLRetriever” type of retriever and its name suggests that it is used to common SSL enabled IMAP protocol, which meets our needs. Other fields in this section are easy to understand, the last one defines which folder the program should check, and don’t forget the last comma (,) if you only have one mailbox to be listed there. For destination, I was planning (finally I changed my mind, read on) to use procmail so I have a “MDA_external” type destination, and point the path to “/usr/bin/procmail”. In the last section, “delete” defines whether mails should be deleted from remote server after being received, as said in Fetchmail configuration’s explanation, it is not really necessary for Gmail users, “message_log” defines where the log of retrieving every message should go and “message_log_syslog” is the switch whether message retrieve info should be logged to syslog. “read_all” should not be set to true because it is saying that ignoring what have been fetched and get everything every time Getmail4 runs. “verbose” controls log verbosity: if set to 2, print messages about each of its actions; if set to 1, print messages about retrieving and deleting messages (only); if set to 0, ll only print warnings and errors. “delivered_to” and “received” tells Getmail4 whether it should add those two fields to retrieved emails header, personally like them to be set to “no”.

  • Before we start our test, we need to setup our MDA – Procmail. It is the time for me to tell you why I prefer Maildir local storage format to Mbox used by Thunderbird. In Maildir, every email is stored in a single file, and a real folder contains all the files in a “folder” on remote server. The email files are in plain text. There are three most important benefits from Maildir:

  • 1. No need to lock the folder as in Mbox, Maildir stores emails in single plain text files that can accessed by multiple programs at the same time and can be easily maintained by scripts.
  • 2. The maintenance of storage now depends on your file system rather than an email client. Nowadays file systems on *NIX systems like Ext3/4, Reiserfs, Btrfs are really stronger than ever before and is more reliable than a client program, because a file system is designed to maintain files but maintenance of email storage is only one of the functions of a rather smaller project comparing with file system designing. For the worst situation our data get damaged while the mail program is processing things, we get failed for some of the messages in Maildir, but the whole folder would be corrupted in Mbox.
  • 3. Mailboxes in Maildir format can be used through a network file system (like NFS), but Mbox cannot.
  • And here are disadvantages:

  • 1. Maildir is not supported by many client software while Mbox is universally supported.
  • 2. Some filesystems may not efficiently handle a large number of small files (like XFS).
  • 3. Searching text is not as fast as Mbox. If we want to speed up the search process, a helper program with cache is needed.
  • As I can search the text of email within web interface of Gmail, I don’t need to care too much about the searching disadvantages. I am using Ext4 and is strong enough to handle thousands of small files in one directory. At last I am choosing Mutt which supports Maildir very well. I’d like to take the advantages of it now.

    Here is my configuration of Procmail, $HOME/.procmailrc:
    VERBOSE=off
    DEFAULT=$HOME/.mail/inbox/
    MAILDIR=$HOME/.mail/
    LOGFILE=$HOME/.mail/procmail.log

    Don’t forget the slash after “$HOME/.mail/inbox/”, if you lose it Procmail will use Mbox, and you if add it, Maildir instead.

    So we can kick off our test now, keep in mind that neither Fethmail nor Getmail4 nor Procmail need root privilege, just run them in your account:
    $ fetchmail -v
    $ getmail
    I have to admit my prediction of result is totally wrong. I thought Fetchmail is used by many many people and is written in the efficient C, Getmail4 only has a smaller user base and is written in Python which may take more resource on many cases. But the result tells me, under the current configuration, neither of them work for me: Fetchmail fails to fetch around 1/10 of my attachments and only get 0.x KB for a 5MB+ email; Getmail4 stuck when fetching mails larger than 5MB.

    What a hell! But I am not stopping because there is another way – Getmail4 is designed to have some MDA functions built-in, so it can deliver messages directly to Maildir or Mbox format for user. It is time to say I like Gmail’s excellent spam filtering feature so that I don’t need to pay so much time on setting up a spam filter with Procmail or Maildrop, and a simple delivery is okay. Now I change the [destination] section of $HOME/.getmail/getmailrc to:
    [destination]
    type = Maildir
    path = ~/.mail/inbox/

    and run again:
    $ getmail
    Great this time and all my emails are retrieved successfully after a long time’s wait (just leave it here and move on other stuff).

    After choosing a suitable receiving MTA (my choice is with MDA built-in), I still need a sending MTA. There are several popular choices, for light-weight ones: msmtp, esmtp; for powerful ones: exim4, postfix and qmail. The last three ones are run as root daemon, designed to be full replacement of the traditional sendmail. Usually we don’t need such big things for a daily purpose, and they are really worth considering if you would like to run a *real* MTA that can exchange emails with other servers.
    Either msmtp or esmtp is designed to work as an agent to forward local email to a real MTA server supports SMTP protocol. Currently msmtp is more welcomed, but the feature list is shorter than esmtp. After a detailed check, that esmtp is not maintained now, so I choose msmtp. Here is my configuration, $HOME/.msmtprc:
    defaults
    tls on
    tls_starttls on
    tls_certcheck on
    tls_trust_file /etc/ssl/certs/ca-certificates.crt
    logfile ~/.mail/msmtp.log

    account default
    host smtp.gmail.com
    port 587
    from user@gmail.com
    auth on
    user user
    password passwd

    There is your username and password in this file, so follow the instruction before to change this file to 0600 mode. “tls”, “tls_starttls” and “tls_certcheck” tells msmtp to use STARTTLS for encryption, and check for validation of the cert.

    Finally, I go to the key part – Mutt. Here is some essential lines from my $HOME/.muttrc:
    ignore *
    unignore From Subject Lines
    hdr_order From Subject Lines

    set index_format="%[%b-%d] %?X?%X& ? %-2e %-18.18L [%4c] %s"
    set status_on_top=yes

    set editor="vim -c 'norm O'"

    set sendmail="/usr/bin/msmtp"
    set sendmail_wait = 5

    set mbox_type=Maildir
    set folder="~/.mail"
    set mask="!^\\.[^.]"
    set mbox="+inbox"
    set record="+inbox"
    set postponed="+inbox"
    set spoolfile="~/.mail/inbox/"
    set trash="~/.mail/trash/"
    set maildir_trash=no

    set quit=yes
    set move=no
    set beep_new=yes
    set check_new=yes
    set recall=no
    set resolve=yes
    set allow_8bit
    set charset="utf-8"
    set rfc2047_parameters=yes

    set include=yes
    set indent_str="> "
    set mime_forward
    set mime_forward_rest
    set fast_reply
    unset metoo
    unset reply_self
    set reply_regexp="^(re([\[0-9\]+])*|aw|回复)(:[ \t]|:)"
    set quote_regexp="^( {0,4}-?[>|:]| {0,4}[a-z0-9]+[>|]+)+"

    set from='Name Last '
    set use_from
    set envelope_from=yes
    set realname='First Last'

    bind index gg first-entry
    bind index G last-entry
    bind index \cf next-page
    bind index \cb previous-page
    bind index ,g group-reply
    bind pager j next-line
    bind pager k previous-line
    bind pager previous-line
    bind pager next-line
    bind pager gg top
    bind pager G bottom

    color hdrdefault black default
    color quoted red default
    color signature brightblack default
    color indicator brightwhite red
    color attachment black default
    color error red default
    color message blue default
    color search brightwhite magenta
    color status brightyellow blue
    color tree red default
    color normal blue default
    color tilde green default
    color bold brightyellow default
    color markers red default

    Thanks to Roy L Zuo (roylzuo at gmail dot com) for great aid! There are much more lines in my mutt configuration, and the colors are suitable for white background.

    In conclusion, here are my choice:
    Mutt + Getmail4 + Msmtp

    For Chinese Linux users, we may meet many mp3 files with GBK/BIG5 encoded ID3 tags, which are very time consuming to change them to UTF8 one by one.

    Before using the following solution, please confirm that you won’t use media players on Windows that cannot handle UTF8 encoded ID3 tags correctly, and you won’t use MP3 players that don’t support it.

    Firstly, install python-mutagen package. For Debian/Ubuntu, use:
    $ sudo aptitude install python-mutagen
    For Fedora and others, probably:
    # yum install python-mutagen

    Secondly go to the directory containing files need to be converted.
    For files with GBK tags:
    find . -iname "*.mp3" -execdir mid3iconv -e GBK {} \;
    For files with BIG5 tags:
    find . -iname "*.mp3" -execdir mid3iconv -e BIG5 {} \;
    There is a good thing that the program could check if the encoding we selected is suitable, so when we convert GBK encoded files, the BIG5 ones won’t be changed. But please don’t use GB18030 option because it will cause problem when the file aren’t really GB18030 encoded.

    If you need to edit tag text yourself, try easytag to help you. :P

    Thanks to the great work done by the team, we can release this desktop course for spreading Ubuntu in Chinese.

    Here is the project homepage:
    http://people.ubuntu.com/~happyaron/udc-cn/

    We have released both HTML and PDF. Also, Docbook format is available.

    Yesterday, we held a meeting about Ubuntu China LoCo Team resigning and nomination, and talked a lot about our LoCo team’s future activities.

    Meeting minutes:
    1.What is Ubuntu loco contact? [happyaron]
    2.Vote for new loco contact, and the new contact decided is Eleanor Chen. (10 Pros, 0 Cons, 4 not voted)
    3.Discussed about FullCircle China team’s work.
    4.Made decisions about participating the Ubuntu Global Jam, and proposed for opening a classroom for teaching people who are interested in becoming an MOTU.
    5.All participants agree with organizing more community activities, but not get a conclusion on when is the best time for starting a Ubuntu user group at Beijing.

    Here is the meeting log (Chinese):
    ubuntu-cn-meeting.log

    Covert flac to mp3

    2010/07/06

    Some time we have to use mp3 format because our moveable devices commonly don’t support flac/ogg, thus I need to convert a .flac file to mp3. It’s very simple:
    First of all, install flac and lame package from your distribution’s repository.
    Then, using the following command to actually covert the file.
    1.flac -d filename.flac
    This will output an filename.wav in the same directory.
    2.lame filename.wav
    Now a fresh filename.mp3 is lying in the same directory.

    Enjoy!

    Ubuntu China LoCo Community is going to host a party at 2:00 PM, on May 8th in the Traktirr Russian Restaurant, Beijing, in order to celebrate the release of Ubuntu 10.04 LTS. Everybody, from newbie to hacker, is warmly welcomed. We will invite some people to give lectures about Ubuntu, but what is truly important is, we hope that everybody can participate in the party! We sincerely wish that all of you can have fun here. Remember that there are a cake, some free CDs of Ubuntu 10.04 LTS and other souvenirs waiting for you!

    After a period’s work, Ubuntu Desktop Course has been translated to Simplified Chinese, the content has already adapted to 9.10, most of them are ready for 10.04. We are happy to announce this to public and wish this course can help more people who speak Chinese enjoy and share Free Software.

    HTML view:
    http://people.ubuntu.com/~happyaron/udc-cn/
    PDF generation still have some problems.

    This work is licensed under Creative Commons 3.0 Attribution No-Commercial Share-Alike .

    It is a proven fact that Ubuntu, the Linux for Human Beings, is a great GNU/Linux distribution, which enables more and more people all over the world enjoy free software, share their knowledge and joys.

    Being an user of Ubuntu, I must say all the work done by the community and Canonical is awesome; but as a contributor from a not English spoken country, I would be extremely happy to see we can launch localized edition Live CDs, in other words language specific edition Live CDs for users that have different languages and preferences.

    For different languages there always be different cultures, and this caused to different user preferences. There are many people don’t have enough knowledge about English to use a not localized computer in this world. A user of this kind will find it essential to download and install many thing to complete their language support when they installed Ubuntu from our Live CD in the past and at present. Most of these users have some common usage of software, so install these “language preferred” software is another required task before the system is usable. Do you think such a thing is very annoying? Yes, users would be much happier when they find an operating system designed to be very considerate.

    We have spent lots of man power on improving the process of installation including language support, and a GNU/Linux distribution always ships not only a system but also a set of selected applications, but I think things are still not perfect for us. Microsoft and Apple make their operating systems have different language’s editions, and as a non-native English speaker, I ‘d like to say it worth. Users prefer to have a fully localized environment in every corner they can see from the very beginning. But for Ubuntu we can only add translations of software that used during installation. The live session is an exciting feature, but I always here somebody ask “why are those all in English?””is there a fully translated Ubuntu available?” I’ve explained our current situation times by times, and these people always return to say “Ubuntu is great, but if there is a fully translated one, things will be even better.” The way to solve such problem, is having a language specific edition.

    So there are teams and individuals appear to make their distributions based on Ubuntu, or we are regarding them as Ubuntu Derivatives. The existence of these derivatives help us spread our distribution in the positive side, but there are really negative side, it’s not just a problem on user choice, like between Fedora and Ubuntu, but something influence our build of community. Those derivatives always not only ship language packs but also some small tweaks for specific user groups (not like Mint, which makes some bigger differences). Due to many reasons, there always be breakages and bugs that never existed in official Live CD. Users have to choose a provider that he or she can trust when they are about to turn to Ubuntu but can hardly accept to start from a global edition Live CD with minor support of his or her language. But who can make sure the quality of these derivatives? Perhaps nobody can tell. For the derivatives provided by non-profit organizations, situations are better than those profit-driven teams. I know some editions have changes that bring security holes, ship Ads (e.g. hard change on Firefox home page which point to a site full of Ads), and of course some of them refused to open there changes. Yes, users are able to drop those unwilling changes, but why he or she tries a derivative if they like to deal with such issues? We may still say it doesn’t matter a lot up to here. Then, most of those derivative’s authors don’t supply support even though some of them have make changes and cause problems, and even some of them push the support work to local community deliberately. Apart from general questions, these users always ask about problems caused by derivative’s changes. It is an annoying and overwhelming job to answer, even just tell them “to use the official one” can be an awful thing that few people like to do. This lead to discount to our community, and those users may think Ubuntu and our community are not friendly because most of them don’t know the real situation exactly.

    Making official localized Live CDs can also lead to a new stage of Live CD usage. A Live CD can be used as a demo, a rescue system, or even a temporary working environment, the live session is a feature that many users like very much. As mentioned before, a not English spoken user can find some very limited support in the current Live CD. We need to admit it can hardly be used to do anything other than run a installation. Even for a demo purpose, other will always ask about the nearly all English environment. I’ve said in the beginning of this piece, users prefer to seeing that every corner he or she can reach is localized. To achieve a better usage of Live CD, a full localization is critical for these users. As for languages that need input method to input characters, for instance CJK languages (Chinese, Japanese and Korean), without a full featured input method, their usage of Live CD can be even more limited. It is really hard to input these complex scripts, though we have ibus with general m17n support by default, but you can only type characters one by one, such thing look very ridiculous for nowadays input method development and usage. When you cannot input a sentence, how can you make it even if you just want to search the web for some articles via live session?

    Apart from the meanings of official localized Live CDs above, users can save time on downloading and installing language support and perhaps other common software using a localized Live CD. For example, to complete a basic language support of Chinese needs around 100MiB to be downloaded, such a size only count in the language packs and input method without pulling in any other common software like StarDict to land on the system. With a localized Live CD, users can have a usable environment to be installed when they can’t access a fast Internet connection, or even without a connection, such feature is obviously welcomed by many users who have desired it for long. With a fully localized environment, we can simplify user’s configuration process, and make it really almost ready-to-use once installed.

    Making the localized Live CDs don’t need any changes on our most infrastructures, it is just a matter of default selection of software in the CD. This will cause some more work for CD image team, translation exportation and our ISO building facilities, but I think it worth it. The intention of default package sets and some QA work can be done by the LoCo teams.

    We can’t provide Live CDs for all languages, especially at the very beginning, but starting with having a try for some languages that have special need of care and a big amount of potential users is worthwhile. We can accumulate experience and make the process better. Windows and Macs can have language specific editions, why we can’t?

    Providing official localized editions can be a big step forward on spreading Ubuntu and free software to the world. The progress of making it out is another try on the cooperation of development community and local communities. Ubuntu is Linux for Human Beings, I think such an action is really to that point, which will benefit a lot of users throughout the world.

    When Ubuntu 9.10 releases, pppoe connection via NetworkManager is impossible because some bug in it. So I switched to the traditional but workable way – pppoeconf, now the problem seems to be solved when using nm team PPA, so I plan to turn back.
    But during my process, there are some other problems. Firstly nm cannot handle the connections automatically; secondly we cannot edit connections system wide.

    Here are the correct steps:

    First, add “NetworkManager daily trunk builds for ubuntu” PPA:

    deb http://ppa.launchpad.net/network-manager/trunk/ubuntu karmic main
    deb-src http://ppa.launchpad.net/network-manager/trunk/ubuntu karmic main

    Second, comment out line “exec pppd call dsl-provider” in /etc/ppp/pppoe_on_boot, that is to say disable my previous “pppoe on boot” setting which is configured by pppoeconf.

    Third, rename /etc/network/interfaces to backup file. NetworkManager will only handle connections which haven’t declared in interfaces, if you didn’t any tunning on such file, you can delete it, but backup before doing any change is a good habit, :)

    Forth, edit /usr/share/polkit-1/actions/org.freedesktop.network-manager-settings.system.policy , find out the line contains “System policy prevents modification of system settings”, and below it there is a “auth_admin_keep“, change it to “yes“. This will enable you to edit a system wide connection. If you consider this will do harm to your security, then revert the change once you have set up your connection correctly.

    Fifth, reboot your system, because these settings won’t take effects even though you have run “sudo services network-manager restart” and “sudo services networking restart”.

    Now it is working on my system, cheers!