VM disk performance on btrfs partition

Recently I read Alexandre Rosenfeld’s blog post about low performance of btrfs while placing VM’s disk on it, so I had a try and got almost the same result.

I tried to have someone look into the problem, and now get the answer, quoting Bastian Blank :

This is a result of the filesystem design, no bug. For decent performance don’t use O_SYNC on files. For qemu use cache=writeback in the disk definition.

Sending files from initramfs busybox

I have to say busybox is so cool!

Today I did an installation of Debian Squeeze using daily netinst ISO image in an i686 qemu-kvm virtual machine. I was using ext4 for /boot, and btrfs for /. The installation process was quite normal as expected.

After the installation I rebooted the virtual machine. Grub2 loaded correctly, but I was dropped into busybox ash prompt next to that… Reporting:

FATAL: Error inserting btrfs (/lib/modules/2.6.32-5-686)/kernel/fs/btrfs/btrfs.ko) unknown symbol in module, or unknown parameter (see dmesg)

So I go to #debian and #debian-devel to ask for help, lindi- asked me for dmesg output and told me to “just configure networking in initramfs and use busybox netcat”.

The actual procedure is:
1. Of course save dmesg output to file first:
(busybox) dmesg > dmesg.txt

2. Run ipconfig in busybox to configure the network (I was using qemu-kvm as superuser, so no need to deal with user mode networking problem):
(busybox) ipconfig eth0

3. Run netcat on host machine to listen a port, e.g. 3333:
$ nc -l 3333

4. Send the file:
(busybox) cat dmesg.txt | nc 192.168.100.1 3333

And I finally got the dmesg.txt lying on my host machine. Well, I still haven’t get the virtual machine working till now. :-(

Update 2010-1-4 :
This bug has been reported as Debian Bug #608538.
Quoting Joey Hess:

I hope this can be dealt with, it seems to be the only remaining issue in getting Debian to support btrfs root filesystems.

It appears to be an issue about btrfs module need crc32 module, but crc32 isn’t loaded automatically.

Work around of file name problem while unzip handling CJK encodings

Unzip 5.x has an option -O to specific the encoding of file names in an ZIP archive, but when 6.0 is arriving with unicode support, that option disappeared as well. CJK users need special cares on support and conversion of obsolete encoding while they are switching to utf-8.

Here is my workaround about this problem, install p7zip and convmv packages on your system first, then:
$ env LC_ALL=C 7z x file.zip
$ convmv -f gbk -t utf8 --notest *

File names extracted by unzip are not able to be converted to correct one whatever you do with it, but what is done by 7z can be converted by convmv.

Moving more on, we can automate this action to a script:
#! /bin/sh
LANG=C /usr/bin/7z x -y "$1" | sed -n 's/^Extracting //p' | sed '1!G;h;$!d' | xargs convmv -f gbk -t utf8 --notest >/dev/null 2>/dev/null

Save it us unzip.sh, then try:
$ sh unzip.sh file.zip
This will act as what unzip does, but with additional care about converting file name encoding from gbk to utf-8. Moreover, convmv can detect whether your file name is already utf-8 encoded and will skip it.

If your file names are encoded other encoding, please replace “gbk” with the appropriate name.

Proposed PPA and key management enhancement for Ubuntu Tweak

Ubuntu Tweak is a magical tool for users to configure Ubuntu easier, it has a growing amount of users. I propose to make some enhancement to provide better PPA and key security.
We need to have a GPG key pair to sign and verify texts to make sure the key hash list is credible. And I prefer SHA128/256 to be our hash for key file fingerprints because some users are considering MD5/SHA1 is not so reliable today. We are just verify some key files, so such performance degradation is bearable.
Now I will describe what will it do actually when a user install/upgrade a Ubuntu Tweak.
We need to prompt to import a GPG public key to users keyring for the first time a user start Ubuntu Tweak and go to the PPA tunning section, or when he deleted or changed the key in his keyring. Every time the application should check current user’s keyring to find the a key’s fingerprint, and the key ID is our preshipped version in our program(do not worry about someone has changed this value, we have the procedure to verify it).
Then prompt the user we need to update our application data online, including PPA and their key definitions. And UTCOM need to provide a LATEST version file as well as the current version of data. We UT checks for update, it compare the LATEST version and determine whether it need to update the data. Such action can be done once a week or twice (of course the first run we need do it as well).
The data pack should contain the following content:
1.Public key fingerprint which we have mentioned before, this is used to verify the key once the data pack has been extracted.
2.source.list.d entries
3.PPA keys
4.PPA key fingerprint (hash, sha256 perhaps)

When download finishes, the application first verify the data package with its signature (can be achieve with another text file contains the tar file’s hash, and sign that text with GPG method). If everything goes correct, extract the package and find out the GPG key fingerprint and compare it with the system installed one (what we used to verify the tar pack just now), when the verify works, we can believe the data is reliable, and check hash of other key files.

Every time Ubuntu Tweak add a PPA, it should check the PPA list it downloaded and verified, so we can believe the program won’t add PPA that we haven’t check.

Differences among several kernel signals

Here is the final summary.

Today I was fuzzied by the help content of the command ‘timeout’, as its name suggest, it a COMMAND, and kill it if still running after a specified period of time. Because of its termination action, I met something related to signals. Googled and finally make a summary here, that is, what’s the differences among HUP, INT, KILL, TERM, USR1 signal.
Everytime when we shutdown our Linux box, it will show “Sending all processes the TERM signal” then “Sending all processes the KILL signal”. Definatly, the TERM signal can be more graceful than KILL, but the best thing we should do now might be looking up the manual page.
In the manual page of kill(7), we get something like this (I’ve ignored things not closely connected to our topic, same below):
Name    Num    Action    Description
————————————————————-
HUP        1        exit
INT        2        exit
KILL        9        exit        cannot be blocked
TERM    15        exit
USR1            exit
Seems not so much information provided. Anyway, we can know they are all related to the process exit action,and the KILL (9) signal cannot be blocked, in other words it cannot be caught by a process so they don’t have the chance to have actions to block it.

Then I searched Google and found some hints from Apache httpd documentation.In the section of Stopping and Restarting, we can see content which have the meaning like this (all signals should be sent to a parent process in this place):
TERM – stop now: The parent immediately attempt to kill off all of its children. It may take it several seconds to complete killing off its children. Then the parent itself exits. Any requests in progress are terminated, and no further requests are served.
USR1 – graceful restart: The parent “advise” the children to exit after their current request (or to exit immediately if they’re not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.
HUP – restart: The parent kill off its children like in TERM, but the parent doesn’t exit. It re-reads its configuration files, and re-opens any log files. Then it spawns a new set of children and continues serving hits.

But things provided still cannot explain our question. Ah, from the last line of manual page of kill(7) we can see there should be another page named ‘signal’. Okay, look up into it and here is what we get :
Signal    Value    Action    Comment
—————————————————
SIGHUP    1        Term    Hangup detected on controlling terminal or death of controlling process
SIGINT    2        Term    Interrupt from keyboard
SIGKILL    9        Term    Kill signal
SIGTERM    15        Term    Termination signal
SIGUSR1    30,10,16    Term    User-defined signal 1
SIGUSR2    31,12,17    Term    User-defined signal 2
Hoo! More information are displayed!

Finally, we can have a short summary:

TERM – Terminate – This signal you send if you want to end a process. It allows the process to clean up nicely though, not like the -9 option (KILL) which just ends everything.
INT – Interrupt – This is a permission which for instance can be used in an NFS environment. If a process hangs, you can interrupt it with say the Ctrl C option.
KILL – exactly what it says. It kills a process without allowing it to clean up, meaning, end all threads, kill child processes etc. It just stops it and can leave either files or other processes in an inconsistant state.
HUP – Hang UP – What this signal does is, when you send it to say the inetd process, it basically tells inetd to go reread it’s configuration file as certain changes have been made which you have to incorporate now. The process doesn’t actually stop, it just, like stated, rereads it’s config.
USR1 – This is a user defined signal. For instance, on our Tru64 systems, we use this signal to tell say the binary log daemon to go save it’s current log file, archive it, and then start a new one. I imagine though it could be used for many other things, thus user defined.

Reference:

1.Manual pages kill(1), signal(7).
2.Apache HTTP Server Documentation Version 2.2 – Stop and Restarting

This work by Aron Xu is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported.