Archive for the ‘hacking’ Category

WideOpenId – woid.cryptobitch.de

Friday, December 5th, 2014

Uh, I meant to blog about this a while ago, but somehow, it got lost… Anyway, I was inspired by http://openid.aliz.es and intrigued by OpenID I set out to find an implementation that comes with an acceptable level of required effort to set up and run.

While the idea of federated authentication sounds nice, the concepts gets a bit flawed if everybody uses Google or Stackexchange as their identity provider. Also, you might not really want to provide your very own OpenID for good reasons. Pretty much as with email, which is why you could make use of mailinator, yopmail, or others.

There is a list of server software on the OpenID page, but none of them really looked like low effort. I wouldn’t want to install Django or any other web framework. But I’d go with a bad Python solution before even looking at PHP.

There is an “official” OpenID example server which is not WSGI aware and thus requires more effort than I am willing to invest. Anyway, I took an existing OpenID server and adapted it such that anyone could log in. Always. When developing and deploying, I noticed that mod_wsgi‘s support for virtualenv is really bad. For example, the PYTHONPATH cannot be inside Apache’s VirtualHosts declaration and you thus need a custom WSGI file which hard codes the Python version. It appears that there is also no helper on the Python level to “load” a virtual env. Weird.

woid server in action

Anyway, you can now enjoy OpenID by providing http://woid.cryptobitch.de/your-id-here as your identity provider. The service will happily tell anyone that any ID is valid. So you can log in as any name you one. A bit like mailinator for OpenID.

To test whether the OpenID provider actually works, you can download the example consumer and start it.
Screenshot from 2014-01-06 16:49:43

Bahn Bonus Points Saemmeln

Monday, November 17th, 2014

mdb_166803_saemmel_bc_4zu1_734x183_hq

The Bahn currently has a Web-based game for you to win some of their loyalty points. It’s not a very exciting game, but you get up to 500 points which is half a free ride across the country. (You get the other half when signing up for their program.)

In order to get these 500 points you need to play for an hour or so. Or you observe the Web traffic your browser generates and look closely. You’ll see that the Flash applet fetches a token from the server and sends your result, along with the token and some hash, to the server. How to get the correct hash you ask? Worry not, you will get the correct hash from the server if you don’t send the correct one. You can resend your request with the hash the server sent you and your POST will be accepted. Neat.

I don’t know why they send the “correct_hash”, but it’s obviously a bad idea.

PS: It seems that Kazam has troubles recording my mouse pointer position correctly.

Reverse sshuttle tunnel to connect to separate networks

Tuesday, September 2nd, 2014

I had to solve that the split horizon DNS problem in order to find my way out to the Internet. The complementary problem is how to access the internal network form the Internet. The scenario being, for example, your home network being protected by a very angry firewall that you don’t necessarily control. However, it’d be quite handy to be able to SSH into your machines at home, use the printer, or connect to the internal messaging system.

However, everything is pretty much firewalled such that no incoming connections are possible. Fortunately, outgoing connections to an SSH server are possible. With the RemoteForward option of OpenSSH we can create a reverse tunnel to connect to the separate network. All it requires is a SSH server that you can connect to from both sides, i.e. the internet and the separate network, and some configuration, maybe like this on the machine within the network: ssh -o 'RemoteForward=localhost:23 localhost:22' root@remotehost and this for the internet machine:

Host dialin
    User toor
    HostName my.server
    Port 23

It then looks almost like this:

      
+---------------------------------------+                       
|Internet                               |                       
+---------------------------------------+
|  +-----------+                        |                       
|  |My machine | +------------+         |                       
|  +-----------+              |         |                       
|                             |         |                       
|                  +----------v--+      |                       
|                  |             |      |                       
|                  | SSH Server  |      |                       
|                  |             |      |                       
|                  +----------+--+      |                       
|                         ^   |         |                       
+------------------------ |   | --------+                       
                          |   |                                 
+------------------------ |   | --------+                       
|XXXXXXXX   Firewall  XX  |   | XXXXXXXX|                       
+------------------------ |   | --------+                       
                          |   |                                 
+------------------------ |   | --------+
| ACME.corp  10/8         |   |         |                       
+------------------------ |   | --------+
|                         |   |         |                       
|               +---------+---|------+  |                       
|   XMPP  <-+   |             |      |  |                       
|           |   |             |      |  |                       
|           |   |             v      |  |                       
|   Print <----------+ ssh -R        |  |                       
|           |   |      via corkscrew |  |                       
|           |   |                    |  |                       
|   VCS   <-+   +--------------------+  |                       
|               |  My machine        |  |                       
|               +--------------------+  |                       
|                                       |                       
+---------------------------------------+                       

“But…” I hear you say. What about the firewall? How would we connect in first place? Sure, we can use corkscrew, as we’ve learned. That will then look a bit more convoluted, maybe like this:


ssh -o ProxyCommand="corkscrew proxy.acme.corp 80 ssh.my.server 443" -o 'RemoteForward=localhost:23 localhost:22' root@lolcathost

What? You don’t have corkscrew installed? Gnah, it’s dangerous to go alone, take this:

cd
wget --continue http://www.agroman.net/corkscrew/corkscrew-2.0.tar.gz
tar xvf corkscrew*.tar*
cd corkscrew*
./configure --prefix=~/corkscrew; make; make install

echo -e  'y\n'|ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa

(echo -n 'command="read",no-X11-forwarding,no-agent-forwarding '; cat ~/.ssh/id_rsa.pub ;echo;echo EOF)

As a bonus, you get a SSH public key which you can add on the server side, i.e. cat >> ~root/.ssh/authorized_keys <<EOF. Have you noticed? When logging on with that key, only the read command will be executed.

That’s already quite helpful. But how do you then connect? Via the SSH server, of course. But it’s a bit of a hassle to first connect there and then somehow port forward via SSH and all. Also, in order to resolve internal names, you’d have to first SSH into the separate machine to issue DNS queries. That’s all painful and not fun. How about an automatic pseudo VPN that allows you to use the internal nameserver and transparently connects you to your internal network?

Again, sshuttle to the rescue. With the same patches applied to /etc/NetworkManager/dnsmasq.d/corp-tld, namely

# resolves names both, .corp and .acme
server=/acme.corp/10.2.3.4
server=/corp.acme/10.3.4.5

you can make use of that lovely patch for dns hosts. In the following example, we have a few nameservers defined, just in case: 10.2.3.4, 10.3.4.5, 10.4.5.6, and 10.5.6.7. It also excludes some networks that you may not want to have transparently routed. A few of them are actually standard local networks and should probably never be routed. Finally, the internal network is defined. In the example, the networks are 10.1.2.3/8, 123.1.2.3/8, and 321.456.0.0/16.


sshuttle --dns-hosts 10.2.3.4,10.3.4.5,10.4.5.6,10.5.6.7 -vvr dialin 10.1.2.3/8 123.1.2.3/8 321.456.0.0/16 \
--exclude 10.0.2.1/24 \
--exclude 10.183.252.224/24 \
--exclude 127.0.1.1/8 \
--exclude 224.0.0.1/8 \
--exclude 232.0.0.1/8 \
--exclude 233.252.0.0/14 \
--exclude 234.0.0.0/8

This setup allows you to simply execute that command and enjoy all of your networks. Including name resolution.

Installing OpenSuSE 13.1 on an Lenovo Ideapad S10-3t

Monday, June 9th, 2014

I tried to install the most recent OpenSuSE image I received when I attended the OpenSuSE Conference. We were given pendrives with a live image so I was interested how smooth the OpenSuSE installation was, compared to installing Fedora. The test machine is a three to four year old Intel Ideapad s10-3t, which I received from Intel a while ago. It’s certainly not the most powerful machine, but it’s got some dual core CPU, a gigabyte of RAM, and a widescreen touch display.

The initial boot took a while. Apparently it changed something on the pendrive itself to expand to its full size, or so. The installation was a bit painful and, at the end of the day, not successful. The first error I received was about my username being wrong. It told me that I must only contain letters, digits, and other things. It did not tell me what was actually wrong; and I doubt it could, because my username was very legit. I clicked away the dialogue and tried again. Then it worked…

When I was asked about my partitioning scheme I was moderately confused. The window didn’t present any “next” button. I clicked the three only available buttons to no avail until it occurred to me that the machine has a wide screen so the vertical space was not sufficient to display everything. And yeah, after moving the window up, I could proceed.

While I was positively surprised to see that it offered full disk encryption, I wasn’t too impressed with the buttons. They were very tiny on the bottom of the screen, barely clickable.

Anyway, I found my way to proceed, but when attempting to install, YaST received “system error code -1014″ and failed to partition the disk. The disk could be at fault, but I have reasons to believe it was not the disks fault:

Apparently something ate all the memory so that I couldn’t even start a terminal. I guess GNOME’s system requirements are higher than I expected.

Split DNS Resolution

Thursday, April 17th, 2014

For the beginning of the year, I couldn’t make resolutions. The DNS server that the DHCP server gave me only resolves names from the local domain, i.e. acme.corp. Every connection to the outside world needs to go through a corporate HTTP proxy which then does the name resolution itself.

But that only works as long as the HTTP proxy is happy, i.e. with the destination port. It wouldn’t allow me to CONNECT to any other port than 80 (HTTP) or 443 (HTTPS). The proxy is thus almost useless for me. No IRC, no XMPP, no IMAP(s), no SSH, etc.

Fortunately, I have an SSH server running on port 443 and using the HTTP proxy to CONNECT to that machine works easily, i.e. using corkscrew with the following in ~/.ssh/config:

Host myserver443
  User remote-user-name
  HostName ssh443.example.com
  ProxyCommand corkscrew proxy.acme.corp 8080 %h %p
  Port 443

And with that SSH connection, I could easily tunnel TCP packets using the DynamicForward switch. That would give a SOCKS proxy and I only needed to configure my programs or use tsocks. But as I need a destination IP address in order to assemble TCP packets, I need to have DNS working, first. While a SOCKS proxy could do it, the one provided by OpenSSH cannot (correct me, if I am wrong). Obviously, I need to somehow get onto the Internet in order to resolve names, as I don’t have any local nameserver that would do that for me. So I need to tunnel. Somehow.

Most of the problem is solved by using sshuttle, which is half a VPN, half a tunnelling solution. It recognises your local machine sending packets (using iptables), does its magic to transport these to a remote host under your control (using a small python program to get the packets from iptables), and sends the packets from that remote host (using a small daemon on the server side). It also collects and forwards the answers. Your local machine doesn’t really realise that it is not really connecting itself.

As the name implies it uses SSH as a transport for the packets and it works very well, not only for TCP, but also for UDP packets you send to the nameserver of your choice. So external name resolution is done, as well as sending TCP packets to any host. You may now think that the quest is solved. But as sshuttle intercepts *all* queries to the (local) nameserver, you don’t use that (local nameserver) anymore and internal name resolution thus breaks (because the external nameserver cannot resolve printing.acme.corp). That’s almost what I wanted. Except that I also want to resolve the local domain names…

To clarify my setup, marvel at this awesome diagram of the scenario. You can see my machine being inside the corporate network with the proxy being the only way out. sshuttle intercepts every packet sent to the outside world, including DNS traffic. The local nameserver is not used as it cannot resolve external names. Local names, such as printing.acme.corp, can thus not be resolved.



  +-----------------------------------------+
  | ACME.corp                               |
  |-----------------------------------------|
  |                                         |
  |                                         |
  | +----------------+        +-----------+ |
  | |My machine      |        | DNS Server| |
  | |----------------|        +-----------+ |
  | |                |                      |
  | |sshuttle        |        +-----------+ |
  | |       corkscrew+------->| HTTP Proxy| |
  | +----------------+        +-----+-----+ |
  |                                 |       |
  +---------------------------------|-------+
                                    |
  +-----------------------------------------+
  | Internet                        |       |
  |-----------------------------------------|
  |                                 v       |
  |       +----------+        +----------+  |
  |       |DNS Server|<-------+SSH Server|  |
  |       +----------+        +----------+  |
  |                            +  +  +  +   |
  |                            |  |  |  |   |
  |                            v  v  v  v   |
  +-----------------------------------------+

To solve that problem I need to selectively ask either the internal or the external nameserver and force sshuttle to not block traffic to the internal one. Fortunately, there is a patch for sshuttle to specify the IP address of the (external) nameserver. It lets traffic designated for your local nameserver pass and only intercept packets for your external nameserver. Awesome.

But how to make the system select the nameserver to be used? Just entering two nameservers in /etc/resolv.conf doesn’t work, of course. One solution to that problem is dnsmasq, which, fortunately, NetworkManager is running anyway. A single line added to the configuration in /etc/NetworkManager/dnsmasq.d/corp-tld makes it aware of a nameserver dedicated for a domain:

server=/acme.corp/10.1.1.2

With that setup, using a public DNS server as main nameserver and make dnsmasq resolve local domain names, but make sshuttle intercept the requests to the public nameserver only, solves my problem and enables me to work again.

~/sshuttle/sshuttle --dns-hosts 8.8.8.8 -vvr myserver443 0/0 \
	--exclude 10.0.2.15/8 \
	--exclude 127.0.1.1/8 \
	--exclude 224.0.0.1/8 \
	--exclude 232.0.0.1/8 \
	--exclude 233.252.0.0/14 \
	--exclude 234.0.0.0/8 \

Applying international Bahn travel tricks to save money for tickets

Thursday, November 21st, 2013

Suppose you are sick of Tanzverbot and you want to go from Karlsruhe to Hamburg. As a proper German you’d think of the Bahn first, although Germany started to allow long distance travel by bus, which is cheap and surprisingly comfortable. My favourite bus search engine is busliniensuche.de.

Anyway, you opted for the Bahn and you search a connection, the result is a one way travel for 40 Euro. Not too bad:
bahn-ka-hh-40

But maybe we can do better. If we travel from Switzerland, we can save a whopping 0.05 Euro!
bahn-basel-hh-40
Amazing, right? Basel SBB is the first station after the German border and it allows for international fares to be applied. Interestingly, special offers exist which apparently make the same travel, and a considerable chunk on top, cheaper.

But we can do better. Instead of travelling from Switzerland to Germany, we can travel from Germany to Denmark. To determine the first station after the German border, use the Netzplan for the IC routes and then check the local map, i.e. Schleswig Holstein. You will find Padborg as the first non German station. If you travel from Karlsruhe to Padborg, you save 17.5%:
bahn-ka-padborg-33

Sometime you can save by taking a Global ticket, crossing two borders. This is, however, not the case for us:
bahn-basel-padborg-49

In case you were wondering whether it’s the very same train and route all the time: Yes it is. Feel free to look up the CNL 472.
db-cnl-472

I hope you can use these tips to book a cheaper travel.
Do you know any ways to “optimise” your Bahn ticket?

Finding Maloney

Wednesday, July 3rd, 2013

Every so often I feel the need to replace the music coming out of my speakers with an audio drama. I used to listen to Maloney which is a detective story with, well, weird plots. The station used to provide MP3 files for download but since they revamped their website that is gone as the new one only provides flash streaming.

As far as I know, there is only one proper library to access media via Adobe HDS. There are two attempts and a PHP script.

There is, however, a little trick making things easier. The website exposes a HTML5 player if it thinks you’re a moron. Fortunately, it’s easy to make other people think that. The easiest thing to do is to have an IPaid User-Agent header. The website will play the media not via Adobe HDS (and flash) but rather via a similar, probably Apple HTTP Live Streaming, method. And that uses a regular m3u playlist with loads of tiny AAC fragments :-)

The address of that playlist is easily guessable and I coded up a small utility here. It will print the ways to play the latest Maloney episode. You can then choose to either use HDS or the probably more efficient AAC version.

$ python ~/vcs/findmaloney/maloney.py 
mplayer -playlist http://srfaodorigin-vh.akamaihd.net/i/world/maloney/04df3324-4096-4dd5-b7c3-6f9b904e3f91.,q10,q20,.mp4.csmil/master.m3u8

livestreamer "hds://http://srfaodorigin-vh.akamaihd.net/z/world/maloney/04df3324-4096-4dd5-b7c3-6f9b904e3f91.,q10,q20,.mp4.csmil/manifest.f4m" best

enjoy!

Scale Text to the maximum of a page with LaTeX

Friday, April 19th, 2013

Being confronted with having to produce a simple poster that holds just a few letter but prints them as big as possible, I found myself needing to scale text (or a letter) on a page.

At first, I found \scalebox, which unfortunately takes a scaling factor, and not two dimensions. Instead of trying to do math, I found \resizebox which does take dimensions (width and height).

You could think that simply scaling up to the \textwidth is enough, but it’s not as you can see from the following “l” which was typeset using this code:

\documentclass[
landscape,
a6paper,
]{scrartcl}
\usepackage[pdftex]{graphicx}
\usepackage{palatino}
\begin{document}
\resizebox{\textwidth}{!}{l}%
\end{document}

And here’s the result:

"l" doesn't scale on A6 landscape paper

So the character doesn’t scale well in the sense that if it is too narrow, it would grow too tall. Unfortunately, it doesn’t automatically keep the aspect ratio and it doesn’t take such an argument as \includegraphic does. Fortunately, you can still make it keep the aspect ratio by globally setting the appropriate flag! So the following will work as expected:

\documentclass[landscape]{minimal}
\usepackage[showframe,a4paper]{geometry}
\usepackage{graphicx}
\setkeys{Gin}{keepaspectratio}

\newcommand{\vstretch}[1]{\vspace*{\stretch{#1}}}
\usepackage{palatino}
\begin{document}
\resizebox{\textwidth}{\textheight}{l}%
\end{document}

Another last thing is then multiline and centered output. The awesome people over at texexchange have a solution:

\documentclass[landscape]{minimal}
\usepackage[showframe,a6paper]{geometry}
\usepackage{varwidth}
\usepackage{graphicx}
\setkeys{Gin}{keepaspectratio}

\newcommand{\vstretch}[1]{\vspace*{\stretch{#1}}}
\usepackage{palatino}
\begin{document}
\topskip0pt
% This seems to fully work
\vstretch{1}
\centering\noindent\resizebox*\textwidth\textheight{\begin{varwidth}{\textwidth}%
\centering%
foooooooooooooooo

\centering
bar%
\end{varwidth}}

\vstretch{1}

\pagebreak
% Trying to other method with the table
\vstretch{1}
\centering\noindent\resizebox*\textwidth\textheight{\begin{varwidth}{\textwidth}%
\begin{tabular}{@{}c@{}}
foooooooooooooooo\\

bar
\end{tabular}%
\end{varwidth}}
\vstretch{1}

\end{document}

And the rendered result:

Converting Mailman archives (mboxes) to maildir

Tuesday, November 13th, 2012

I wanted to search discussions on mailing lists and view conversations. I didn’t want to use some webinterface because that wouldn’t allow me to search quickly and offline. So making my mail client aware of these emails seemed to be the way to go. Fortunately, the GNOME mailinglists are mbox archived. So you download the entire traffic in a standardised mbox.

But how to properly get this into your email clients then? I think Thunderbird can import mbox natively. But I wanted to access it from other clients, too, so I needed to make my server aware of these emails. Of course, I configured my mailserver to use maildir, so some conversion was needed.

I will present my experiences dealing with this problem. If you want to do similar things, or even only want to import the mbox directly, this post might be for you.

The archives

First, we need to get all the archives. As I had to deal with a couple of mailinglists and more than a couple of month, I couldn’t be arsed to click every single mbox file manually.

The following script scrapes the mailman page. It makes use of the interesting Splinter library, basically a wrapper around selenium and other browsers for Python.

#!/usr/bin/env python

import getpass
from subprocess import Popen, list2cmdline
import sys

import splinter

def fill_password(b, username=None, password=None):
    if not username:
        username = getpass.getpass('username: ')
    if not password:
        password = getpass.getpass('password: ')
        
    b.fill('username', username)
    b.fill('password', password)
    b.find_by_name('submit').click()


def main(url, username=None):
    b = splinter.Browser()
    
    try:
        #url = 'https://mail.gnome.org/mailman/private/board-list/'
        b.visit(url)
        
        if 'Password' in b.html:
            fill_password(b, username=username)


        links = [l['href'] for l in b.find_link_by_partial_text('Text')]

        cookie = b.driver.get_cookies()[0]
        cookie_name = cookie['name']
        cookie_value = cookie['value']
        cookie_str = "Cookie: {name}={value}".format(name=cookie_name, value=cookie_value)
        wget_cookie_arg = '--header={0}'.format(cookie_str)
        #print  wget_cookie_arg
        
        b.quit()

        
        for link in links:
            #print link
            cmd = ['wget', wget_cookie_arg, link]
            print list2cmdline(cmd)
            # pipe that to "parallel -j 8"

    except:
        b.quit()


if __name__ == '__main__':
    site = sys.argv[1]
    user = sys.argv[2]
    
    if site.startswith('http'):
        url=site
    else:
        url = 'https://mail.gnome.org/mailman/private/{0}'.format(site)
    
    main(username=user, url=url)

        

You can download the thing, too.

I use splinter because handling cookies is not fun as well as parsing the web page. So I just use whatever is most convenient for me, I wanted to get things done, after all. The script will print a line for each link it found, nicely prefixed with wget and its necessary arguments for the authorization cookie. You can pipe that to sh but if you want to download many month, you want to do it in parallel. And fortunately, there is an app for that!

Conversion to maildir

After having received the mboxes, it turned out to be a good idea nonetheless to convert to maildir; if only to extract properly formatted mails only and remove duplicates.

I came around mb2md-3.20.pl from 2004 quite soon, but it is broken. It cannot parse the mboxes I have properly. It will create broken mails with header lingering around as it seems to be unable to detect the beginning of new mails reliably. It took me a good while to find the problem though. So again, be advised, do not use mb2md 3.20.

As I use mutt myself I found this blog article promising. It uses mutt to create a mbox out of a maildir. I wanted it the other way round, so after a few trial and errors, I figured that the following would do what I wanted:

mutt -f mymbox -e 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s/tmp/mymuttmaildir"'

where “mymbox” is your source file and “/tmp/mymuttmaildir” the target directory.

This is a bit lame right? We want to have parameters, because we want to do some batch processing on many archive mboxes.

The problem is, though, that the parameters are very deep inside the quotes. So just doing something like

mutt -f $source -e 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s$target"'

wouldn’t work, because the $target would be interpreted as a raw string due to the single quotes. And I couldn’t find a way to make it work so I decided to make it work with the language that I like the most: Python. So an hour or so later I came up with the following which works (kinda):

import os
import subprocess
source = os.environ['source']
destination = os.environ['destination']

conf = 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s{0}"'.format(destination)

cmd = ['mutt', '-f', source, '-e', conf]
subprocess.call(cmd)

But well, I shouldn’t become productive just yet by doing real work. Mutt apparently expects a terminal. It would just prompt me with “No recipients were specified.”.

So alright, this unfortunately wasn’t what I wanted. I you don’t need batch processing though, you might very well go with mutt doing your mbox to maildir conversion (or vice versa).

Damnit, another two hours or more wasted on that. I was at the point of just doing the conversion myself. Shouldn’t be too hard after all, right? While researching I found that Python’s stdlib has some email related functions *yay*. Some dude on the web wrote something close to what I needed. I beefed it up a very little bit and landed with the following:

#!/usr/bin/env python

# http://www.hackvalue.nl/en/article/109/migrating%20from%20mbox%20to%20maildir

import datetime
import email
import email.Errors
import mailbox
import os
import sys
import time


def msgfactory(fp):
    try:
        return email.message_from_file(fp)
    except email.Errors.MessageParseError:
        # Don't return None since that will
        # stop the mailbox iterator
        return ''
dirname = sys.argv[1]
inbox = sys.argv[2]
fp = open(inbox, 'rb')
mbox = mailbox.UnixMailbox(fp, msgfactory)


try:
        storedir = os.mkdir(dirname, 0750)
        os.mkdir(dirname + "/new", 0750)
        os.mkdir(dirname + "/cur", 0750)
except:
        pass

count = 0
for mail in mbox:
        count+=1
        #hammertime = time.time() # mail.get('Date', time.time())
        hammertime = datetime.datetime(*email.utils.parsedate(mail.get('Date',''))[:7]).strftime('%s')
        hostname = 'mb2mdpy'
        filename = dirname + "/cur/%s%d.%s:2,S" % (hammertime, count, hostname)
        mail_file = open(filename, 'w+')
        mail_file.write(mail.as_string())


print "Processed {0} mails".format(count)

And it seemed to work well! It recovered many more emails than the Perl script (hehe) but the generated maildir wouldn’t work with my IMAP server. I was confused. The mutt maildirs worked like charm and I couldn’t see any difference to mine.

I scped the file onto my .maildir/ on my server, which takes quite a while because scp isn’t all too quick when it comes to many small files. Anyway, it wouldn’t necessarily work for some reason which is way beyond me. Eventually I straced the IMAP server and figured that it was desperately looking for a tmp/ folder. Funnily enough, it didn’t need that for other maildirs to work. Anyway: Lesson learnt: If your dovecot doesn’t play well with your maildir and you have no clue how to make it log more verbosely, check whether you need a tmp/ folder.

But I didn’t know that so I investigated a bit more and I found another PERL script which converted the emails fine, too. For some reason it put my mails in “.new/” and not in “.cur/“, which the other tools did so far. Also, it would leave the messages as unread which I don’t like.

Fortunately, one (more or less) only needs to rename the files in a maildir to end in S for “seen”. While this sounds like a simple

for f in maildir/cur/*; do mv ${f} ${f}:2,S

it’s not so easy anymore when you have to move the directory as well. But that’s easily being worked around by shuffling the directories around.

Another, more annoying problem with that is “Argument list too long” when you are dealing with a lot of files. So a solution must involve “find” and might look something like this: find ${CUR} -type f -print0 | xargs -i -0 mv '{}' '{}':2,S

Duplicates

There was, however, a very annoying issue left: Duplicates. I haven’t investigated where the duplicates came from but it didn’t matter to me as I didn’t want duplicates even if the downloaded mbox archive contained them. And in my case, I’m quite confident that the mboxes are messed up. So I wanted to get rid of duplicates anyway and decided to use a hash function on the file content to determine whether two file are the same or not. I used sha1sum like this:

$ find maildir/.board-list/ -type f -print0 | xargs -0 sha1sum   | head
c6967e7572319f3d37fb035d5a4a16d56f680c59  maildir/.board-list/cur/1342797208.000031.mbox:2,
2ea005ec0e7676093e2f488c9f8e5388582ee7fb  maildir/.board-list/cur/1342797281.000242.mbox:2,
a4dc289a8e3ebdc6717d8b1aeb88959cb2959ece  maildir/.board-list/cur/1342797215.000265.mbox:2,
39bf0ebd3fd8f5658af2857f3c11b727e54e790a  maildir/.board-list/cur/1342797210.000296.mbox:2,
eea1965032cf95e47eba37561f66de97b9f99592  maildir/.board-list/cur/1342797281.000114.mbox:2,

and if there were two files with the same hash, I would delete one of them. Probably like so:

    #!/usr/bin/env python
    import os
    import sys


    hashes = []
    for line in sys.stdin.readlines():
        hash, fname = line.split()
        if hash in hashes:
            os.unlink(fname)
        else:
            hashes.append(hash)

But it turns out that the following snippet works, too:

find /tmp/maildir/ -type f -print0 | xargs -0 sha1sum | sort | uniq -d -w 40 | awk '{print $2}' | xargs rm

So it’ll check the files for the same contents via a sha1sum. In order to make uniq detect equal lines, we need to give it sorted input. Hence the sort. We cannot, however, check the whole lines for equality as the filename will show up in the line and it will of course be different. So we only compare the size of the hex representation of the hash, in this case 40 bytes. If we found such a duplicate hash, we cut off the hash, take the filename, which is the remainder of the line, and delete the file.

Phew. What a trip so far. Let’s put it all together:

The final thing


LIST=board-list

umask 077

DESTBASE=/tmp/perfectmdir

LISTBASE=${DESTBASE}/.${LIST}

CUR=${LISTBASE}/cur
NEW=${LISTBASE}/new
TMP=${LISTBASE}/tmp

mkdir -p ${CUR}
mkdir -p ${NEW}
mkdir -p ${TMP}

for f in  /tmp/${LIST}/*; do /tmp/perfect_maildir.pl ${LISTBASE} < ${f} ; done
mv ${CUR} ${CUR}.tmp
mv ${NEW} ${CUR}
mv ${CUR}.tmp ${NEW}
find ${CUR} -type f -print0 | xargs -i -0 mv '{}'  '{}':2,S
find ${CUR} -type f -print0 | xargs -0 sha1sum | sort | uniq -d -w 40 | awk '{print $2}' | xargs rm

And that’s handling email in 2012…

Loopback monting huge gzipped file

Wednesday, October 10th, 2012

This is basically a note to myself for future reference which I hope is interesting to others.

I just had to loopback mount a gzipped image file. I didn’t want, however, to unpack the file, because I am very short on disk space right now. Also, I didn’t care too much about processing power. I searched quite a bit until I found “avfs“.

AVFS is a system, which enables all programs to look inside archived or compressed files, or access remote files without recompiling the programs or changing the kernel.

At the moment it supports floppies, tar and gzip files, zip, bzip2, ar and rar files, ftp sessions, http, webdav, rsh/rcp, ssh/scp

muelli@xbox:/tmp$ avfsd -o allow_root ~/.avfs
muelli@xbox:/tmp$ cd  ~/.avfs/home/muelli/qemu
muelli@xbox:~/.avfs/home/muelli/qemu$ sudo losetup /dev/loop1 XP-4G.ntfs.dd.gz#
muelli@xbox:~/.avfs/home/muelli/qemu$ sudo mount /dev/loop1 -oro,noatime /home/muelli/empty/

Note that the filename I’m accessing is suffixed with a hash.