## (Re)mastering a custom Ubuntu auto-install ISO

Recently, I had to install GNU/Linux on a dozen or so machines. I didn’t want to install manually, mainly because I was too lazy, but also because the AC in the data centre is quite strong and I didn’t want to catch a cold… So I looked for some lightweight way of automatically installing an Ubuntu or so. Fortunately, I don’t seem to be the first person to be looking for a solution, although, retrospectively, I think the tooling is still poor.

I would describe my requirements as being relatively simple. I want to turn one of the to be provisioned machines on, wait, and then be able to log in via SSH. Ideally, most of the software that I want to run would already be installed. I’m fine with software the distribution ships. The installation must not require the Internet and should just work™, i.e. it should wipe the disk and not require anything special from the network which I have only little control over.

I looked at tools like Foreman, Cobbler, and Ubuntu’s MAAS. But I decided against them because it doesn’t necessarily feel lightweight. Actually, Cobbler doesn’t seem to work well when run on Ubuntu. It also fails (at least for me) when being behind an evil corporate proxy. Same for MAAS. Foreman seems to be more of a machine management framework rather than a hit and run style of tool.

So I went for an automated install using the official CD-ROMs. This is sub-optimal as I need to be physically present at the machines and I would have preferred a non-touch solution. Fortunately, the method can be upgrade to delivering the installation medium via TFTP/PXE. But most of the documents describing the process insist on Bind which I dislike. Also, producing an ISO is less error-prone so making that work first should be easier; so I thought.

### Building an ISO

The first step is to mount to ISO and copy everything into a working directory. You could probably use something like isomaster, too.

 mkdir iso.vanilla sudo mount -oloop ubuntu.iso ./iso.vanilla mkdir iso.new sudo cp -ar ./iso.vanilla/* ./iso.vanilla/.* iso.new/ 

After you have made changes to your image, you probably want to generate a new ISO image that you can burn to CD later.

 sudo mkisofs -J -l -b isolinux/isolinux.bin -no-emul-boot -boot-load-size 4 -boot-info-table -z -iso-level 4 -c isolinux/isolinux.cat -o /tmp/ubuntu-16.04-myowninstall-amd64.iso -joliet-long iso.new 

You’d expect that image to work If you now dd it onto a pendrive, but of course it does not… At least it didn’t for me. After trying many USB creators, I eventually found that you need to call isohybrid.

 sudo isohybrid /tmp/ubuntu-16.04-myowninstall-amd64.iso 

Now you can test whether it boots with qemu:

 qemu-img create -f qcow2 /tmp/ubuntu.qcow2 10G qemu-system-x86_64 -m 1G -cdrom ubuntu-16.04-server-amd64.iso -hda /tmp/ubuntu-nonet.qcow2 

If you want to test whether a USB image would boot, try with -usb -usbdevice disk:/tmp/ubuntu-16.04-myowninstall-amd64.iso. If it doesn’t, then you might want to check whether you have assigned enough memory to the virtual machine. I needed to give -m 1G, because the default didn’t work with the following mysterious error.

It should also be possible to create a pendrive with FAT32 and to boot it on EFI machines. But my success was limited…

### Making Changes

Now what changes do you want to make to the image to get an automated installation?
First of all you want to get rid of the language selection. Rumor has it that

 echo en | tee isolinux/lang 

is sufficient, but that did not work for me. Replacing timeout values in files in the isolinux to something strictly positive worked much better for me. So edit isolinux/isolinux.cfg.

If the image boots now, you don’t want the installer to ask you questions. Unfortunately, there doesn’t seem to be “fire and forget” mode which tries to install as aggressively as possible. But there are at least two mechanisms: kickstart and preseed. Ubuntu comes with a kickstart compatibility layer (kickseed).

Because I didn’t know whether I’ll stick with Ubuntu, I opted for kickstart which would, at least theoretically, allow me for using Fedora later. I installed system-config-kickstart which provides a GUI for creating a kickstart file. You can then place the file in, e.g. /preseed/ks-custom.cfg next to the other preseed files. To make the installer load that file, reference it in the kernel command line in isolinux/txt.cfg, e.g.

 default install label install menu label ^Install Custom Ubuntu Server kernel /install/vmlinuz append file=/cdrom/preseed/ubuntu-server.seed vga=788 initrd=/install/initrd.gz ks=cdrom:/preseed/ks-custom.cfg DEBCONF_DEBUG=5 cdrom-detect/try-usb=false usb_storage.blacklist=yes -- 

Ignore the last three options for now and remember them later when we talk about issues installing from a pen drive.

When you boot now, you’d expect it to “just work”. But if you are me then you’ll run into the installer asking you questions. Let’s discuss these.

### Multiple Network Interfaces

When you have multiple NICs, the installer apparently asks you for which interface to use. That is, of course, not desirable when wanting to install without interruption. The documentation suggest to use

 d-i netcfg/choose_interface select auto 

That, however, seemed to crash the installer when I configured QEMU to use four NICs… I guess it’s this bug which, at least on my end, had been cause by my accidentally putting “eth0” instead of “auto”. It’s weird, because it worked fine with the single NIC setup. The problem, it seems, is that eth0 does not exist! It’s 2016 and we have “predictable device names” now. Except that we still have /dev/sda for the first harddisk. I wonder whether there is a name for the first NIC. Anyway, if you do want to have the eth0 scheme back, it seems to be possible by setting biosdevname=0 as kernel parameter when booting.

You can test with multiple NICs and QEMU like this:

 sudo qemu-system-x86_64 -m 1G -boot menu=on -hda /tmp/ubuntu-nonet.qcow2 -runas $USER -usb -usbdevice disk:/tmp/ubuntu-16.04-myowninstall-amd64.iso -netdev user,id=network0 -device e1000,netdev=network0 -netdev user,id=network1 -device e1000,netdev=network1 -netdev user,id=network2 -device e1000,netdev=network2 -netdev user,id=network3 -device e1000,netdev=network3 -cdrom /tmp/ubuntu-16.04-myowninstall-amd64.iso  ### No Internet Access When testing this with the real servers, I realised that my qemu testbed was still too ideal. The real machines can resolve names, but cannot connect to the Internet. I couldn’t build that scenario with qemu, but the following gets close:  sudo qemu-system-x86_64 -m 1G -boot menu=on -hda /tmp/ubuntu-nonet.qcow2 -runas$USER -usb -usbdevice disk:/tmp/ubuntu-16.04-myowninstall-amd64.iso -netdev user,id=network0,restrict=y -device e1000,netdev=network0 -netdev user,id=network1,restrict=y -device e1000,netdev=network1 -netdev user,id=network2,restrict=y -device e1000,netdev=network2 -netdev user,id=network3,restrict=y -device e1000,netdev=network3 -cdrom /tmp/ubuntu-16.04-myowninstall-amd64.iso 

That, however, fails:

The qemu options seem to make the built-in DHCP server to not hand out a default gateway via DHCP. The installer seems to expect that, though, and thus stalls and waits for user input. According to the documentation a netcfg/get_gateway value of "none" could be used to make it proceed. It’s not clear to me whether it’s a special none type, the string literal “none”, or the empty string. Another uncertainty is how to actually make it work from within the kickstart file, because using this debconf syntax is for preseeding, not kickstarting. I tried several things,

 preseed netcfg/get_gateway none preseed netcfg/get_gateway string preseed netcfg/get_gateway string 1.2.3.4 preseed netcfg/get_gateway string none preseed netcfg/no_default_route boolean true 

The latter two seemed to worked better. You may wonder how I found that magic configuration variable. I searched for the string being displayed when it stalled and found an anonymous pastebin which carries all the configurable items.

After getting over the gateway, it complained about missing nameservers. By putting

 preseed netcfg/get_nameservers string 8.8.8.8 

I could make it proceed automatically.

### Overwriting existing partitions

When playing around you eventually get to the point where you need to retry, because something just doesn’t work. Then you change your kickseed file and try again. On the same machine you’ve just left half-installed with existing partitions and all. For a weird reason the installer mounts the partition(s), but cannot unmount them

The documentation suggest that a line like

 preseed partman/unmount_active boolean true 

would be sufficient, but not so for me. And it seems to be an issue since 2014 at least. The workarounds in the bug do not work. Other sources suggested to use partman/early_command string umount -l /media || true, partman/filter_mounted boolean false, or partman/unmount_active seen true. Because it’s not entirely clear to me, who the “owner” , in terms of preseed, is. I’ve also experimented with setting, e.g. preseed --owner partman-base partman/unmount_active boolean true. It started to work when I set preseed partman/unmount_active DISKS /dev/sda and preseed --owner partman-base partman/unmount_active DISKS /dev/sda. I didn’t really believe my success and reordered the statements a bit to better understand what I was doing. I then removed the newly added statements and expected it to not work. However, it did. So I was confused. But I didn’t have the time nor the energy to follow what really was going on. I think part of the problem is also that it sometimes tries to mount the pendrive itself! Sometimes I’ve noticed how it actually installed the system onto the pendrive *sigh*. So I tried hard to make it not mount USB drives. The statements that seem to work for me are the above mentioned boot parameters (i.e. cdrom-detect/try-usb=false usb_storage.blacklist=yes) in combination with:

 preseed partman/unmount_active boolean true preseed --owner partman-base partman/unmount_active boolean true preseed partman/unmount_active seen true preseed --owner partman-base partman/unmount_active seen true

 #preseed partman/unmount_active DISKS /dev/sda #preseed --owner partman-base partman/unmount_active DISKS /dev/sda 

preseed partman/early_command string "umount -l /media || true" preseed --owner partman-base partman/early_command string "umount -l /media ||$ How I found that, you may ask? Enter the joy of debugging. ### Debugging debconf When booting with DEBCONF_DEBUG=5, you can see a lot of information in /var/log/syslog. You can see what items are queried and what it thinks the answer is. It looks somewhat like this: You can query yourself with the debconf-get tool, e.g.  # debconf-get partman/unmount_active true  The file /var/lib/cdebconf/questions.dat seems to hold all the possible items. In the templates.dat you can see the types and the defaults. That, however, did not really enlighten me, but only wasted my time. Without knowing much about debconf, I’ve noticed that you seem to be able to not only store true and false, but also flags like “seen”. By looking at the screenshot above I’ve noticed that it forcefully sets partman/unmount_active seen false. According to the documentation mentioned above, some code really wants this flag to be reset. So that way was not going to be successful. I noticed that the installer somehow sets the DISKS attribute to the partman/unmount_active, so I tried to put the disk in question (/dev/sda) and it seemed to work. ### Shipping More Software I eventually wanted to install some packages along with the system, but not through the Internet. I thought that putting some more .debs in the ISO would be as easy as copying the file into a directory. But it’s not just that easy. You also need to create the index structure Debian requires. The following worked well enough for me:  cd iso.new cd pool/extras apt-get download squid-deb-proxy-client cd ../.. sudo apt-ftparchive packages ./pool/extras/ | sudo tee dists/stable/extras/binary-i386/Packages  I was surprised by the i386 suffix. Although I can get over the additional apt-ftparchive, I wish it wouldn’t be necessary. Another source of annoyance is the dependencies. I couldn’t find a way to conveniently download all the dependencies of a given package. These packages can then be installed with the %packages directive:  %packages @ ubuntu-server ubuntu-minimal openssh-server curl wget squid-deb-proxy-client avahi-daemon avahi-autoipd telnet nano #build-essential #htop  Or via a post-install script:  %post   apt-get install -y squid-deb-proxy-client apt-get update apt-get install -y htop apt-get install -y glusterfs-client glusterfs-server apt-get install -y screen apt-get install -y qemu-kvm libvirt-bin  Unfortunately, I can’t run squid-deb-proxy-client in the installer itself. Not only because I don’t know how to properly install the udeb, but also because it requires the dbus daemon to be run inside the to-be-installed system which proves to be difficult. I tried the following without success:  preseed anna/choose_modules string squid-deb-proxy-client-udeb  preseed preseed/early_command string apt-install /cdrom/pool/extras/squid-deb-proxy-client_0.8.14_all.deb  %pre anna-install /cdrom/pool/extras/squid-deb-proxy-client-udeb_0.8.14_all.udeb  If you happen to know how to make it work, I’d be glad to know about it. ### Final Thoughts Having my machines installed automatically cost me much more time than installing them manually. I expected to have tangible results much quicker than I actually did. However, now I can re-install any machine within a few minutes which may eventually amortise the investment. I’m still surprised by the fact that there is no “install it, dammit!” option for people who don’t really care about the details and just want to get something up and running. Unfortunately, it seems to be non-trivial to just save the diff of the vanilla and the new ISO The next Ubuntu release will then require me to redo the modifications. Next time, however, I will probably not use the kickseed compatibility layer and stick to the pure method. ## WideOpenId – woid.cryptobitch.de Uh, I meant to blog about this a while ago, but somehow, it got lost… Anyway, I was inspired by http://openid.aliz.es and intrigued by OpenID I set out to find an implementation that comes with an acceptable level of required effort to set up and run. While the idea of federated authentication sounds nice, the concepts gets a bit flawed if everybody uses Google or Stackexchange as their identity provider. Also, you might not really want to provide your very own OpenID for good reasons. Pretty much as with email, which is why you could make use of mailinator, yopmail, or others. There is a list of server software on the OpenID page, but none of them really looked like low effort. I wouldn’t want to install Django or any other web framework. But I’d go with a bad Python solution before even looking at PHP. There is an “official” OpenID example server which is not WSGI aware and thus requires more effort than I am willing to invest. Anyway, I took an existing OpenID server and adapted it such that anyone could log in. Always. When developing and deploying, I noticed that mod_wsgi‘s support for virtualenv is really bad. For example, the PYTHONPATH cannot be inside Apache’s VirtualHosts declaration and you thus need a custom WSGI file which hard codes the Python version. It appears that there is also no helper on the Python level to “load” a virtual env. Weird. Anyway, you can now enjoy OpenID by providing http://woid.cryptobitch.de/your-id-here as your identity provider. The service will happily tell anyone that any ID is valid. So you can log in as any name you one. A bit like mailinator for OpenID. To test whether the OpenID provider actually works, you can download the example consumer and start it. ## Bahn Bonus Points Saemmeln The Bahn currently has a Web-based game for you to win some of their loyalty points. It’s not a very exciting game, but you get up to 500 points which is half a free ride across the country. (You get the other half when signing up for their program.) In order to get these 500 points you need to play for an hour or so. Or you observe the Web traffic your browser generates and look closely. You’ll see that the Flash applet fetches a token from the server and sends your result, along with the token and some hash, to the server. How to get the correct hash you ask? Worry not, you will get the correct hash from the server if you don’t send the correct one. You can resend your request with the hash the server sent you and your POST will be accepted. Neat. I don’t know why they send the “correct_hash”, but it’s obviously a bad idea. PS: It seems that Kazam has troubles recording my mouse pointer position correctly. ## Reverse sshuttle tunnel to connect to separate networks I had to solve that the split horizon DNS problem in order to find my way out to the Internet. The complementary problem is how to access the internal network form the Internet. The scenario being, for example, your home network being protected by a very angry firewall that you don’t necessarily control. However, it’d be quite handy to be able to SSH into your machines at home, use the printer, or connect to the internal messaging system. However, everything is pretty much firewalled such that no incoming connections are possible. Fortunately, outgoing connections to an SSH server are possible. With the RemoteForward option of OpenSSH we can create a reverse tunnel to connect to the separate network. All it requires is a SSH server that you can connect to from both sides, i.e. the internet and the separate network, and some configuration, maybe like this on the machine within the network: ssh -o 'RemoteForward=localhost:23 localhost:22' root@remotehost and this for the internet machine: Host dialin User toor HostName my.server Port 23  It then looks almost like this:  +---------------------------------------+ |Internet | +---------------------------------------+ | +-----------+ | | |My machine | +------------+ | | +-----------+ | | | | | | +----------v--+ | | | | | | | SSH Server | | | | | | | +----------+--+ | | ^ | | +------------------------ | | --------+ | | +------------------------ | | --------+ |XXXXXXXX Firewall XX | | XXXXXXXX| +------------------------ | | --------+ | | +------------------------ | | --------+ | ACME.corp 10/8 | | | +------------------------ | | --------+ | | | | | +---------+---|------+ | | XMPP <-+ | | | | | | | | | | | | | v | | | Print <----------+ ssh -R | | | | | via corkscrew | | | | | | | | VCS <-+ +--------------------+ | | | My machine | | | +--------------------+ | | | +---------------------------------------+  “But…” I hear you say. What about the firewall? How would we connect in first place? Sure, we can use corkscrew, as we’ve learned. That will then look a bit more convoluted, maybe like this:  ssh -o ProxyCommand="corkscrew proxy.acme.corp 80 ssh.my.server 443" -o 'RemoteForward=localhost:23 localhost:22' root@lolcathost  What? You don’t have corkscrew installed? Gnah, it’s dangerous to go alone, take this: cd wget --continue http://www.agroman.net/corkscrew/corkscrew-2.0.tar.gz tar xvf corkscrew*.tar* cd corkscrew* ./configure --prefix=~/corkscrew; make; make install echo -e 'y\n'|ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa (echo -n 'command="read",no-X11-forwarding,no-agent-forwarding '; cat ~/.ssh/id_rsa.pub ;echo;echo EOF)  As a bonus, you get a SSH public key which you can add on the server side, i.e. cat >> ~root/.ssh/authorized_keys <<EOF. Have you noticed? When logging on with that key, only the read command will be executed. That’s already quite helpful. But how do you then connect? Via the SSH server, of course. But it’s a bit of a hassle to first connect there and then somehow port forward via SSH and all. Also, in order to resolve internal names, you’d have to first SSH into the separate machine to issue DNS queries. That’s all painful and not fun. How about an automatic pseudo VPN that allows you to use the internal nameserver and transparently connects you to your internal network? Again, sshuttle to the rescue. With the same patches applied to /etc/NetworkManager/dnsmasq.d/corp-tld, namely # resolves names both, .corp and .acme server=/acme.corp/10.2.3.4 server=/corp.acme/10.3.4.5  you can make use of that lovely patch for dns hosts. In the following example, we have a few nameservers defined, just in case: 10.2.3.4, 10.3.4.5, 10.4.5.6, and 10.5.6.7. It also excludes some networks that you may not want to have transparently routed. A few of them are actually standard local networks and should probably never be routed. Finally, the internal network is defined. In the example, the networks are 10.1.2.3/8, 123.1.2.3/8, and 321.456.0.0/16.  sshuttle --dns-hosts 10.2.3.4,10.3.4.5,10.4.5.6,10.5.6.7 -vvr dialin 10.1.2.3/8 123.1.2.3/8 321.456.0.0/16 \ --exclude 10.0.2.1/24 \ --exclude 10.183.252.224/24 \ --exclude 127.0.1.1/8 \ --exclude 224.0.0.1/8 \ --exclude 232.0.0.1/8 \ --exclude 233.252.0.0/14 \ --exclude 234.0.0.0/8  This setup allows you to simply execute that command and enjoy all of your networks. Including name resolution. ## Installing OpenSuSE 13.1 on an Lenovo Ideapad S10-3t I tried to install the most recent OpenSuSE image I received when I attended the OpenSuSE Conference. We were given pendrives with a live image so I was interested how smooth the OpenSuSE installation was, compared to installing Fedora. The test machine is a three to four year old Intel Ideapad s10-3t, which I received from Intel a while ago. It’s certainly not the most powerful machine, but it’s got some dual core CPU, a gigabyte of RAM, and a widescreen touch display. The initial boot took a while. Apparently it changed something on the pendrive itself to expand to its full size, or so. The installation was a bit painful and, at the end of the day, not successful. The first error I received was about my username being wrong. It told me that I must only contain letters, digits, and other things. It did not tell me what was actually wrong; and I doubt it could, because my username was very legit. I clicked away the dialogue and tried again. Then it worked… When I was asked about my partitioning scheme I was moderately confused. The window didn’t present any “next” button. I clicked the three only available buttons to no avail until it occurred to me that the machine has a wide screen so the vertical space was not sufficient to display everything. And yeah, after moving the window up, I could proceed. While I was positively surprised to see that it offered full disk encryption, I wasn’t too impressed with the buttons. They were very tiny on the bottom of the screen, barely clickable. Anyway, I found my way to proceed, but when attempting to install, YaST received “system error code -1014” and failed to partition the disk. The disk could be at fault, but I have reasons to believe it was not the disks fault: Apparently something ate all the memory so that I couldn’t even start a terminal. I guess GNOME’s system requirements are higher than I expected. ## Split DNS Resolution For the beginning of the year, I couldn’t make resolutions. The DNS server that the DHCP server gave me only resolves names from the local domain, i.e. acme.corp. Every connection to the outside world needs to go through a corporate HTTP proxy which then does the name resolution itself. But that only works as long as the HTTP proxy is happy, i.e. with the destination port. It wouldn’t allow me to CONNECT to any other port than 80 (HTTP) or 443 (HTTPS). The proxy is thus almost useless for me. No IRC, no XMPP, no IMAP(s), no SSH, etc. Fortunately, I have an SSH server running on port 443 and using the HTTP proxy to CONNECT to that machine works easily, i.e. using corkscrew with the following in ~/.ssh/config: Host myserver443 User remote-user-name HostName ssh443.example.com ProxyCommand corkscrew proxy.acme.corp 8080 %h %p Port 443  And with that SSH connection, I could easily tunnel TCP packets using the DynamicForward switch. That would give a SOCKS proxy and I only needed to configure my programs or use tsocks. But as I need a destination IP address in order to assemble TCP packets, I need to have DNS working, first. While a SOCKS proxy could do it, the one provided by OpenSSH cannot (correct me, if I am wrong). Obviously, I need to somehow get onto the Internet in order to resolve names, as I don’t have any local nameserver that would do that for me. So I need to tunnel. Somehow. Most of the problem is solved by using sshuttle, which is half a VPN, half a tunnelling solution. It recognises your local machine sending packets (using iptables), does its magic to transport these to a remote host under your control (using a small python program to get the packets from iptables), and sends the packets from that remote host (using a small daemon on the server side). It also collects and forwards the answers. Your local machine doesn’t really realise that it is not really connecting itself. As the name implies it uses SSH as a transport for the packets and it works very well, not only for TCP, but also for UDP packets you send to the nameserver of your choice. So external name resolution is done, as well as sending TCP packets to any host. You may now think that the quest is solved. But as sshuttle intercepts *all* queries to the (local) nameserver, you don’t use that (local nameserver) anymore and internal name resolution thus breaks (because the external nameserver cannot resolve printing.acme.corp). That’s almost what I wanted. Except that I also want to resolve the local domain names… To clarify my setup, marvel at this awesome diagram of the scenario. You can see my machine being inside the corporate network with the proxy being the only way out. sshuttle intercepts every packet sent to the outside world, including DNS traffic. The local nameserver is not used as it cannot resolve external names. Local names, such as printing.acme.corp, can thus not be resolved.  +-----------------------------------------+ | ACME.corp | |-----------------------------------------| | | | | | +----------------+ +-----------+ | | |My machine | | DNS Server| | | |----------------| +-----------+ | | | | | | |sshuttle | +-----------+ | | | corkscrew+------->| HTTP Proxy| | | +----------------+ +-----+-----+ | | | | +---------------------------------|-------+ | +-----------------------------------------+ | Internet | | |-----------------------------------------| | v | | +----------+ +----------+ | | |DNS Server|<-------+SSH Server| | | +----------+ +----------+ | | + + + + | | | | | | | | v v v v | +-----------------------------------------+  To solve that problem I need to selectively ask either the internal or the external nameserver and force sshuttle to not block traffic to the internal one. Fortunately, there is a patch for sshuttle to specify the IP address of the (external) nameserver. It lets traffic designated for your local nameserver pass and only intercept packets for your external nameserver. Awesome. But how to make the system select the nameserver to be used? Just entering two nameservers in /etc/resolv.conf doesn’t work, of course. One solution to that problem is dnsmasq, which, fortunately, NetworkManager is running anyway. A single line added to the configuration in /etc/NetworkManager/dnsmasq.d/corp-tld makes it aware of a nameserver dedicated for a domain: server=/acme.corp/10.1.1.2  With that setup, using a public DNS server as main nameserver and make dnsmasq resolve local domain names, but make sshuttle intercept the requests to the public nameserver only, solves my problem and enables me to work again. ~/sshuttle/sshuttle --dns-hosts 8.8.8.8 -vvr myserver443 0/0 \ --exclude 10.0.2.15/8 \ --exclude 127.0.1.1/8 \ --exclude 224.0.0.1/8 \ --exclude 232.0.0.1/8 \ --exclude 233.252.0.0/14 \ --exclude 234.0.0.0/8 \  ## Applying international Bahn travel tricks to save money for tickets Suppose you are sick of Tanzverbot and you want to go from Karlsruhe to Hamburg. As a proper German you’d think of the Bahn first, although Germany started to allow long distance travel by bus, which is cheap and surprisingly comfortable. My favourite bus search engine is busliniensuche.de. Anyway, you opted for the Bahn and you search a connection, the result is a one way travel for 40 Euro. Not too bad: But maybe we can do better. If we travel from Switzerland, we can save a whopping 0.05 Euro! Amazing, right? Basel SBB is the first station after the German border and it allows for international fares to be applied. Interestingly, special offers exist which apparently make the same travel, and a considerable chunk on top, cheaper. But we can do better. Instead of travelling from Switzerland to Germany, we can travel from Germany to Denmark. To determine the first station after the German border, use the Netzplan for the IC routes and then check the local map, i.e. Schleswig Holstein. You will find Padborg as the first non German station. If you travel from Karlsruhe to Padborg, you save 17.5%: Sometime you can save by taking a Global ticket, crossing two borders. This is, however, not the case for us: In case you were wondering whether it’s the very same train and route all the time: Yes it is. Feel free to look up the CNL 472. I hope you can use these tips to book a cheaper travel. Do you know any ways to “optimise” your Bahn ticket? ## Finding Maloney Every so often I feel the need to replace the music coming out of my speakers with an audio drama. I used to listen to Maloney which is a detective story with, well, weird plots. The station used to provide MP3 files for download but since they revamped their website that is gone as the new one only provides flash streaming. As far as I know, there is only one proper library to access media via Adobe HDS. There are two attempts and a PHP script. There is, however, a little trick making things easier. The website exposes a HTML5 player if it thinks you’re a moron. Fortunately, it’s easy to make other people think that. The easiest thing to do is to have an IPaid User-Agent header. The website will play the media not via Adobe HDS (and flash) but rather via a similar, probably Apple HTTP Live Streaming, method. And that uses a regular m3u playlist with loads of tiny AAC fragments The address of that playlist is easily guessable and I coded up a small utility here. It will print the ways to play the latest Maloney episode. You can then choose to either use HDS or the probably more efficient AAC version. $ python ~/vcs/findmaloney/maloney.py
mplayer -playlist http://srfaodorigin-vh.akamaihd.net/i/world/maloney/04df3324-4096-4dd5-b7c3-6f9b904e3f91.,q10,q20,.mp4.csmil/master.m3u8

livestreamer "hds://http://srfaodorigin-vh.akamaihd.net/z/world/maloney/04df3324-4096-4dd5-b7c3-6f9b904e3f91.,q10,q20,.mp4.csmil/manifest.f4m" best



enjoy!

## Scale Text to the maximum of a page with LaTeX

Being confronted with having to produce a simple poster that holds just a few letter but prints them as big as possible, I found myself needing to scale text (or a letter) on a page.

At first, I found \scalebox, which unfortunately takes a scaling factor, and not two dimensions. Instead of trying to do math, I found \resizebox which does take dimensions (width and height).

You could think that simply scaling up to the \textwidth is enough, but it’s not as you can see from the following “l” which was typeset using this code:

\documentclass[
landscape,
a6paper,
]{scrartcl}
\usepackage[pdftex]{graphicx}
\usepackage{palatino}
\begin{document}
\resizebox{\textwidth}{!}{l}%
\end{document}

And here’s the result:

So the character doesn’t scale well in the sense that if it is too narrow, it would grow too tall. Unfortunately, it doesn’t automatically keep the aspect ratio and it doesn’t take such an argument as \includegraphic does. Fortunately, you can still make it keep the aspect ratio by globally setting the appropriate flag! So the following will work as expected:

\documentclass[landscape]{minimal}
\usepackage[showframe,a4paper]{geometry}
\usepackage{graphicx}
\setkeys{Gin}{keepaspectratio}

\newcommand{\vstretch}[1]{\vspace*{\stretch{#1}}}
\usepackage{palatino}
\begin{document}
\resizebox{\textwidth}{\textheight}{l}%
\end{document}

Another last thing is then multiline and centered output. The awesome people over at texexchange have a solution:

\documentclass[landscape]{minimal}
\usepackage[showframe,a6paper]{geometry}
\usepackage{varwidth}
\usepackage{graphicx}
\setkeys{Gin}{keepaspectratio}

\newcommand{\vstretch}[1]{\vspace*{\stretch{#1}}}
\usepackage{palatino}
\begin{document}
\topskip0pt
% This seems to fully work
\vstretch{1}
\centering\noindent\resizebox*\textwidth\textheight{\begin{varwidth}{\textwidth}%
\centering%
foooooooooooooooo

\centering
bar%
\end{varwidth}}

\vstretch{1}

\pagebreak
% Trying to other method with the table
\vstretch{1}
\centering\noindent\resizebox*\textwidth\textheight{\begin{varwidth}{\textwidth}%
\begin{tabular}{@{}c@{}}
foooooooooooooooo\\

bar
\end{tabular}%
\end{varwidth}}
\vstretch{1}

\end{document}

And the rendered result:

## Converting Mailman archives (mboxes) to maildir

I wanted to search discussions on mailing lists and view conversations. I didn’t want to use some webinterface because that wouldn’t allow me to search quickly and offline. So making my mail client aware of these emails seemed to be the way to go. Fortunately, the GNOME mailinglists are mbox archived. So you download the entire traffic in a standardised mbox.

But how to properly get this into your email clients then? I think Thunderbird can import mbox natively. But I wanted to access it from other clients, too, so I needed to make my server aware of these emails. Of course, I configured my mailserver to use maildir, so some conversion was needed.

I will present my experiences dealing with this problem. If you want to do similar things, or even only want to import the mbox directly, this post might be for you.

### The archives

First, we need to get all the archives. As I had to deal with a couple of mailinglists and more than a couple of month, I couldn’t be arsed to click every single mbox file manually.

The following script scrapes the mailman page. It makes use of the interesting Splinter library, basically a wrapper around selenium and other browsers for Python.

#!/usr/bin/env python

import getpass
from subprocess import Popen, list2cmdline
import sys

import splinter

b.find_by_name('submit').click()

b = splinter.Browser()

try:
#url = 'https://mail.gnome.org/mailman/private/board-list/'
b.visit(url)

b.quit()

print list2cmdline(cmd)
# pipe that to "parallel -j 8"

except:
b.quit()

if __name__ == '__main__':
site = sys.argv[1]
user = sys.argv[2]

if site.startswith('http'):
url=site
else:
url = 'https://mail.gnome.org/mailman/private/{0}'.format(site)



I use splinter because handling cookies is not fun as well as parsing the web page. So I just use whatever is most convenient for me, I wanted to get things done, after all. The script will print a line for each link it found, nicely prefixed with wget and its necessary arguments for the authorization cookie. You can pipe that to sh but if you want to download many month, you want to do it in parallel. And fortunately, there is an app for that!

### Conversion to maildir

After having received the mboxes, it turned out to be a good idea nonetheless to convert to maildir; if only to extract properly formatted mails only and remove duplicates.

I came around mb2md-3.20.pl from 2004 quite soon, but it is broken. It cannot parse the mboxes I have properly. It will create broken mails with header lingering around as it seems to be unable to detect the beginning of new mails reliably. It took me a good while to find the problem though. So again, be advised, do not use mb2md 3.20.

As I use mutt myself I found this blog article promising. It uses mutt to create a mbox out of a maildir. I wanted it the other way round, so after a few trial and errors, I figured that the following would do what I wanted:

mutt -f mymbox -e 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s/tmp/mymuttmaildir"'


where “mymbox” is your source file and “/tmp/mymuttmaildir” the target directory.

This is a bit lame right? We want to have parameters, because we want to do some batch processing on many archive mboxes.

The problem is, though, that the parameters are very deep inside the quotes. So just doing something like

mutt -f $source -e 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s$target"'


wouldn’t work, because the $target would be interpreted as a raw string due to the single quotes. And I couldn’t find a way to make it work so I decided to make it work with the language that I like the most: Python. So an hour or so later I came up with the following which works (kinda): import os import subprocess source = os.environ['source'] destination = os.environ['destination'] conf = 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s{0}"'.format(destination) cmd = ['mutt', '-f', source, '-e', conf] subprocess.call(cmd)  But well, I shouldn’t become productive just yet by doing real work. Mutt apparently expects a terminal. It would just prompt me with “No recipients were specified.”. So alright, this unfortunately wasn’t what I wanted. I you don’t need batch processing though, you might very well go with mutt doing your mbox to maildir conversion (or vice versa). Damnit, another two hours or more wasted on that. I was at the point of just doing the conversion myself. Shouldn’t be too hard after all, right? While researching I found that Python’s stdlib has some email related functions *yay*. Some dude on the web wrote something close to what I needed. I beefed it up a very little bit and landed with the following: #!/usr/bin/env python # http://www.hackvalue.nl/en/article/109/migrating%20from%20mbox%20to%20maildir import datetime import email import email.Errors import mailbox import os import sys import time def msgfactory(fp): try: return email.message_from_file(fp) except email.Errors.MessageParseError: # Don't return None since that will # stop the mailbox iterator return '' dirname = sys.argv[1] inbox = sys.argv[2] fp = open(inbox, 'rb') mbox = mailbox.UnixMailbox(fp, msgfactory) try: storedir = os.mkdir(dirname, 0750) os.mkdir(dirname + "/new", 0750) os.mkdir(dirname + "/cur", 0750) except: pass count = 0 for mail in mbox: count+=1 #hammertime = time.time() # mail.get('Date', time.time()) hammertime = datetime.datetime(*email.utils.parsedate(mail.get('Date',''))[:7]).strftime('%s') hostname = 'mb2mdpy' filename = dirname + "/cur/%s%d.%s:2,S" % (hammertime, count, hostname) mail_file = open(filename, 'w+') mail_file.write(mail.as_string()) print "Processed {0} mails".format(count)  And it seemed to work well! It recovered many more emails than the Perl script (hehe) but the generated maildir wouldn’t work with my IMAP server. I was confused. The mutt maildirs worked like charm and I couldn’t see any difference to mine. I scped the file onto my .maildir/ on my server, which takes quite a while because scp isn’t all too quick when it comes to many small files. Anyway, it wouldn’t necessarily work for some reason which is way beyond me. Eventually I straced the IMAP server and figured that it was desperately looking for a tmp/ folder. Funnily enough, it didn’t need that for other maildirs to work. Anyway: Lesson learnt: If your dovecot doesn’t play well with your maildir and you have no clue how to make it log more verbosely, check whether you need a tmp/ folder. But I didn’t know that so I investigated a bit more and I found another PERL script which converted the emails fine, too. For some reason it put my mails in “.new/” and not in “.cur/“, which the other tools did so far. Also, it would leave the messages as unread which I don’t like. Fortunately, one (more or less) only needs to rename the files in a maildir to end in S for “seen”. While this sounds like a simple for f in maildir/cur/*; do mv${f} ${f}:2,S  it’s not so easy anymore when you have to move the directory as well. But that’s easily being worked around by shuffling the directories around. Another, more annoying problem with that is “Argument list too long” when you are dealing with a lot of files. So a solution must involve “find” and might look something like this: find${CUR} -type f -print0 | xargs -i -0 mv '{}' '{}':2,S

### Duplicates

There was, however, a very annoying issue left: Duplicates. I haven’t investigated where the duplicates came from but it didn’t matter to me as I didn’t want duplicates even if the downloaded mbox archive contained them. And in my case, I’m quite confident that the mboxes are messed up. So I wanted to get rid of duplicates anyway and decided to use a hash function on the file content to determine whether two file are the same or not. I used sha1sum like this:

$find maildir/.board-list/ -type f -print0 | xargs -0 sha1sum | head c6967e7572319f3d37fb035d5a4a16d56f680c59 maildir/.board-list/cur/1342797208.000031.mbox:2, 2ea005ec0e7676093e2f488c9f8e5388582ee7fb maildir/.board-list/cur/1342797281.000242.mbox:2, a4dc289a8e3ebdc6717d8b1aeb88959cb2959ece maildir/.board-list/cur/1342797215.000265.mbox:2, 39bf0ebd3fd8f5658af2857f3c11b727e54e790a maildir/.board-list/cur/1342797210.000296.mbox:2, eea1965032cf95e47eba37561f66de97b9f99592 maildir/.board-list/cur/1342797281.000114.mbox:2,  and if there were two files with the same hash, I would delete one of them. Probably like so:  #!/usr/bin/env python import os import sys hashes = [] for line in sys.stdin.readlines(): hash, fname = line.split() if hash in hashes: os.unlink(fname) else: hashes.append(hash)  But it turns out that the following snippet works, too: find /tmp/maildir/ -type f -print0 | xargs -0 sha1sum | sort | uniq -d -w 40 | awk '{print$2}' | xargs rm


So it’ll check the files for the same contents via a sha1sum. In order to make uniq detect equal lines, we need to give it sorted input. Hence the sort. We cannot, however, check the whole lines for equality as the filename will show up in the line and it will of course be different. So we only compare the size of the hex representation of the hash, in this case 40 bytes. If we found such a duplicate hash, we cut off the hash, take the filename, which is the remainder of the line, and delete the file.

Phew. What a trip so far. Let’s put it all together:

### The final thing


LIST=board-list

DESTBASE=/tmp/perfectmdir

LISTBASE=${DESTBASE}/.${LIST}

CUR=${LISTBASE}/cur NEW=${LISTBASE}/new
TMP=${LISTBASE}/tmp mkdir -p${CUR}
mkdir -p ${NEW} mkdir -p${TMP}

for f in  /tmp/${LIST}/*; do /tmp/perfect_maildir.pl${LISTBASE} < ${f} ; done mv${CUR} ${CUR}.tmp mv${NEW} ${CUR} mv${CUR}.tmp ${NEW} find${CUR} -type f -print0 | xargs -i -0 mv '{}'  '{}':2,S
find ${CUR} -type f -print0 | xargs -0 sha1sum | sort | uniq -d -w 40 | awk '{print$2}' | xargs rm


And that’s handling email in 2012…