GNOME – Page 4 – muellis blog

Installing Fedora 20 on an Exo PC Slate

For our booth at FOSDEM we had some hardware to show off the latest and greatest GNOME. I brought the tablet I got from the Desktop Summit. In order to prepare it I installed Fedora 20 which comes with a nice and shiny installer. I found a few issues and glitches and will present then them in the course of this post.

It worked well enough, but it has a few glitches. One of them is that GNOME apparently does not detect that it is running on hardware which does not have a keyboard. So it was a bit to enter a password for a wifi when there is no (soft) keyboard…

Some incoherences exist. One of them is that it shows the “Next” button on the bottom right. Which is what I’d expect. But sometimes it also asks the user to press a button on the upper left. I didn’t even remotely expect having to press a button on the “back” side of the screen in order to continue installation.

It was very nice though that it seems to offer installation along an existing operating system *and* full disk encryption. The Ubuntu installer can install you a fully encrypted system nicely, but only if you install Ubuntu on the whole disk. The Fedora installer seems to manage that nicely.

As it seems to be normal nowadays, installation starts even though you haven’t provided all the necessary information yet. That is very convenient and have a much fast installation experience.

Another coherence issue is on the user dialogue. I can actually guess what the thinking was when designing these menus. You have this “overview” screen as seen above and then you “dive” into the sub menus. I expected a more linearly following set of menus. Why would I need return to the overview menu at all? I claim that it is much easier to just continue, not to go forth and back… Anyway, a real bug is visible: Mnemonics are not formatted properly.

The user dialogue, while being at it, seemed to have forced me to enter a strong password. I just wanted to install a system for a demo machine. Probably not the usual usecase, but annoying enough if it doesn’t work smoothly. I think I found out later that I needed to double press the “Next” button (labelled “done” and being placed in the area of the screen where I’d expect “back” buttons to be).

Turns out, that the same thing happened with the root password, which really annoyed me. Especially as the soft keyboard doesn’t really allow for convenient input of complicated characters.

But then I discovered something. On the very bottom there something weirdly coloured. It was a notification for the current menu. Why on earth complain about the password I’ve entered on the very bottom when the widget is on the very top? That was surprising and confusing. Plus, the warning itself was not very visible due to the onscreen keyboard obstructing the view. So I guess it’d be smarter to not have the warnings there.

Anyway, I was pleasantly surprised how smooth the installation experience was. It could, of course, have been better, but all in all it was quite good. I finished in less than half and hour. Too bad that I didn’t know that neither Eye of GNOME nor Epiphany were installed by default.

Converting Mailman archives (mboxes) to maildir

I wanted to search discussions on mailing lists and view conversations. I didn’t want to use some webinterface because that wouldn’t allow me to search quickly and offline. So making my mail client aware of these emails seemed to be the way to go. Fortunately, the GNOME mailinglists are mbox archived. So you download the entire traffic in a standardised mbox.

But how to properly get this into your email clients then? I think Thunderbird can import mbox natively. But I wanted to access it from other clients, too, so I needed to make my server aware of these emails. Of course, I configured my mailserver to use maildir, so some conversion was needed.

I will present my experiences dealing with this problem. If you want to do similar things, or even only want to import the mbox directly, this post might be for you.

The archives

First, we need to get all the archives. As I had to deal with a couple of mailinglists and more than a couple of month, I couldn’t be arsed to click every single mbox file manually.

The following script scrapes the mailman page. It makes use of the interesting Splinter library, basically a wrapper around selenium and other browsers for Python.

#!/usr/bin/env python

import getpass
from subprocess import Popen, list2cmdline
import sys

import splinter

def fill_password(b, username=None, password=None):
    if not username:
        username = getpass.getpass('username: ')
    if not password:
        password = getpass.getpass('password: ')
        
    b.fill('username', username)
    b.fill('password', password)
    b.find_by_name('submit').click()


def main(url, username=None):
    b = splinter.Browser()
    
    try:
        #url = 'https://mail.gnome.org/mailman/private/board-list/'
        b.visit(url)
        
        if 'Password' in b.html:
            fill_password(b, username=username)


        links = [l['href'] for l in b.find_link_by_partial_text('Text')]

        cookie = b.driver.get_cookies()[0]
        cookie_name = cookie['name']
        cookie_value = cookie['value']
        cookie_str = "Cookie: {name}={value}".format(name=cookie_name, value=cookie_value)
        wget_cookie_arg = '--header={0}'.format(cookie_str)
        #print  wget_cookie_arg
        
        b.quit()

        
        for link in links:
            #print link
            cmd = ['wget', wget_cookie_arg, link]
            print list2cmdline(cmd)
            # pipe that to "parallel -j 8"

    except:
        b.quit()


if __name__ == '__main__':
    site = sys.argv[1]
    user = sys.argv[2]
    
    if site.startswith('http'):
        url=site
    else:
        url = 'https://mail.gnome.org/mailman/private/{0}'.format(site)
    
    main(username=user, url=url)

You can download the thing, too.

I use splinter because handling cookies is not fun as well as parsing the web page. So I just use whatever is most convenient for me, I wanted to get things done, after all. The script will print a line for each link it found, nicely prefixed with wget and its necessary arguments for the authorization cookie. You can pipe that to sh but if you want to download many month, you want to do it in parallel. And fortunately, there is an app for that!

Conversion to maildir

After having received the mboxes, it turned out to be a good idea nonetheless to convert to maildir; if only to extract properly formatted mails only and remove duplicates.

I came around mb2md-3.20.pl from 2004 quite soon, but it is broken. It cannot parse the mboxes I have properly. It will create broken mails with header lingering around as it seems to be unable to detect the beginning of new mails reliably. It took me a good while to find the problem though. So again, be advised, do not use mb2md 3.20.

As I use mutt myself I found this blog article promising. It uses mutt to create a mbox out of a maildir. I wanted it the other way round, so after a few trial and errors, I figured that the following would do what I wanted:

mutt -f mymbox -e 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s/tmp/mymuttmaildir"'

where “mymbox” is your source file and “/tmp/mymuttmaildir” the target directory.

This is a bit lame right? We want to have parameters, because we want to do some batch processing on many archive mboxes.

The problem is, though, that the parameters are very deep inside the quotes. So just doing something like

mutt -f $source -e 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s$target"'

wouldn’t work, because the $target would be interpreted as a raw string due to the single quotes. And I couldn’t find a way to make it work so I decided to make it work with the language that I like the most: Python. So an hour or so later I came up with the following which works (kinda):

import os
import subprocess
source = os.environ['source']
destination = os.environ['destination']

conf = 'set mbox_type=maildir; set confirmcreate=no; set delete=no; push "T.*;s{0}"'.format(destination)

cmd = ['mutt', '-f', source, '-e', conf]
subprocess.call(cmd)

But well, I shouldn’t become productive just yet by doing real work. Mutt apparently expects a terminal. It would just prompt me with “No recipients were specified.”.

So alright, this unfortunately wasn’t what I wanted. I you don’t need batch processing though, you might very well go with mutt doing your mbox to maildir conversion (or vice versa).

Damnit, another two hours or more wasted on that. I was at the point of just doing the conversion myself. Shouldn’t be too hard after all, right? While researching I found that Python’s stdlib has some email related functions *yay*. Some dude on the web wrote something close to what I needed. I beefed it up a very little bit and landed with the following:

#!/usr/bin/env python

# http://www.hackvalue.nl/en/article/109/migrating%20from%20mbox%20to%20maildir

import datetime
import email
import email.Errors
import mailbox
import os
import sys
import time


def msgfactory(fp):
    try:
        return email.message_from_file(fp)
    except email.Errors.MessageParseError:
        # Don't return None since that will
        # stop the mailbox iterator
        return ''
dirname = sys.argv[1]
inbox = sys.argv[2]
fp = open(inbox, 'rb')
mbox = mailbox.UnixMailbox(fp, msgfactory)


try:
        storedir = os.mkdir(dirname, 0750)
        os.mkdir(dirname + "/new", 0750)
        os.mkdir(dirname + "/cur", 0750)
except:
        pass

count = 0
for mail in mbox:
        count+=1
        #hammertime = time.time() # mail.get('Date', time.time())
        hammertime = datetime.datetime(*email.utils.parsedate(mail.get('Date',''))[:7]).strftime('%s')
        hostname = 'mb2mdpy'
        filename = dirname + "/cur/%s%d.%s:2,S" % (hammertime, count, hostname)
        mail_file = open(filename, 'w+')
        mail_file.write(mail.as_string())


print "Processed {0} mails".format(count)

And it seemed to work well! It recovered many more emails than the Perl script (hehe) but the generated maildir wouldn’t work with my IMAP server. I was confused. The mutt maildirs worked like charm and I couldn’t see any difference to mine.

I scped the file onto my .maildir/ on my server, which takes quite a while because scp isn’t all too quick when it comes to many small files. Anyway, it wouldn’t necessarily work for some reason which is way beyond me. Eventually I straced the IMAP server and figured that it was desperately looking for a tmp/ folder. Funnily enough, it didn’t need that for other maildirs to work. Anyway: Lesson learnt: If your dovecot doesn’t play well with your maildir and you have no clue how to make it log more verbosely, check whether you need a tmp/ folder.

But I didn’t know that so I investigated a bit more and I found another PERL script which converted the emails fine, too. For some reason it put my mails in “.new/” and not in “.cur/“, which the other tools did so far. Also, it would leave the messages as unread which I don’t like.

Fortunately, one (more or less) only needs to rename the files in a maildir to end in S for “seen”. While this sounds like a simple

for f in maildir/cur/*; do mv ${f} ${f}:2,S

it’s not so easy anymore when you have to move the directory as well. But that’s easily being worked around by shuffling the directories around.

Another, more annoying problem with that is “Argument list too long” when you are dealing with a lot of files. So a solution must involve “find” and might look something like this: find ${CUR} -type f -print0 | xargs -i -0 mv '{}' '{}':2,S

Duplicates

There was, however, a very annoying issue left: Duplicates. I haven’t investigated where the duplicates came from but it didn’t matter to me as I didn’t want duplicates even if the downloaded mbox archive contained them. And in my case, I’m quite confident that the mboxes are messed up. So I wanted to get rid of duplicates anyway and decided to use a hash function on the file content to determine whether two file are the same or not. I used sha1sum like this:

$ find maildir/.board-list/ -type f -print0 | xargs -0 sha1sum   | head
c6967e7572319f3d37fb035d5a4a16d56f680c59  maildir/.board-list/cur/1342797208.000031.mbox:2,
2ea005ec0e7676093e2f488c9f8e5388582ee7fb  maildir/.board-list/cur/1342797281.000242.mbox:2,
a4dc289a8e3ebdc6717d8b1aeb88959cb2959ece  maildir/.board-list/cur/1342797215.000265.mbox:2,
39bf0ebd3fd8f5658af2857f3c11b727e54e790a  maildir/.board-list/cur/1342797210.000296.mbox:2,
eea1965032cf95e47eba37561f66de97b9f99592  maildir/.board-list/cur/1342797281.000114.mbox:2,

and if there were two files with the same hash, I would delete one of them. Probably like so:

    #!/usr/bin/env python
    import os
    import sys


    hashes = []
    for line in sys.stdin.readlines():
        hash, fname = line.split()
        if hash in hashes:
            os.unlink(fname)
        else:
            hashes.append(hash)

But it turns out that the following snippet works, too:

find /tmp/maildir/ -type f -print0 | xargs -0 sha1sum | sort | uniq -d -w 40 | awk '{print $2}' | xargs rm

So it’ll check the files for the same contents via a sha1sum. In order to make uniq detect equal lines, we need to give it sorted input. Hence the sort. We cannot, however, check the whole lines for equality as the filename will show up in the line and it will of course be different. So we only compare the size of the hex representation of the hash, in this case 40 bytes. If we found such a duplicate hash, we cut off the hash, take the filename, which is the remainder of the line, and delete the file.

Phew. What a trip so far. Let’s put it all together:

The final thing


LIST=board-list

umask 077

DESTBASE=/tmp/perfectmdir

LISTBASE=${DESTBASE}/.${LIST}

CUR=${LISTBASE}/cur
NEW=${LISTBASE}/new
TMP=${LISTBASE}/tmp

mkdir -p ${CUR}
mkdir -p ${NEW}
mkdir -p ${TMP}

for f in  /tmp/${LIST}/*; do /tmp/perfect_maildir.pl ${LISTBASE} < ${f} ; done
mv ${CUR} ${CUR}.tmp
mv ${NEW} ${CUR}
mv ${CUR}.tmp ${NEW}
find ${CUR} -type f -print0 | xargs -i -0 mv '{}'  '{}':2,S
find ${CUR} -type f -print0 | xargs -0 sha1sum | sort | uniq -d -w 40 | awk '{print $2}' | xargs rm

And that’s handling email in 2012…

Interview for gnome.org

I was interviewed recently for GNOME.org and while you can read the interview over there, I felt like copying it over here. So enjoy the questions and the answers

Why is open source/free software important to you?

I believe that Free Software makes the world a better place. Also, as I am a bit of a computer security person, it is absolutely crucial to be able to see how the software in question works and be able to eventually fix issues (or have someone to fix them).

How/when/why did you become involved in GNOME?

I was using GNOME ever since and started to follow it more and more until I came greatly involved as a Summer of Code student.

Why did you run for the GNOME Foundation Board?

I am sticking around GNOME for about 5 years now and while I enjoy being in the community I do wanted to progress within GNOME and take new responsibilities.

What do you hope to accomplish during your term on the board?

I hope to push the revamp of the bylaws and enable people to work together more effectively.

Do you think GNOME is heading in the right direction? Why or why not?

I think GNOME is doing well so far, but it must not rest (decadence anyone?). We have very smart people in our community and we should enable them to get awesome stuff for GNOME and the Free Software world done.

Have you attended GUADEC in the past? If so, when/where?

My first GUADEC was the one in Birmingham in 2007.

What are you looking forward to most at GUADEC?

To see friends again and having nice discussions.

Any other thoughts on GUADEC and/or GNOME?

GNOME is a great community and GUADEC is a good place to get in touch. Sometimes though, it might not be comfortable for newcomers to chime in. I still remember myself being too shy to talk to some great GNOME people. So while I recommend to young GNOMErs to not be shy, maybe a “This is GNOME” introductory session for new GNOME people might be a good idea.

GNOME @ FOSDEM 2011

I am very excited about having attended this years FOSDEM. Unfortunately, times were a bit busy so I am a bit late reporting about it, but I still want to state a couple of things.

(I wonder how that image will look in 2012 😉 )

First of all, I am very happy that our GNOME booth went very well. Thanks to Frederic Peters and Frederic Crozat for manning to booth almost all the time. I tried to organise everything remotely and I’d say I partly succeeded. We got stickers, t-shirts and staff for the booth. We lacked presentation material and instructions for the booth though. But it still worked out quite well. For the next time, I’d try to be communicate more clearly who is doing what to prevent duplicate work and ensure that people know who is responsible for what.

Secondly, I’d like to thank Canonical for their generosity to sponsor a GNOME Event Box. After the orginal one went missing, Canocical put stuff like a PC, a projector, a monitor and lots of other stuff together for us to be able to show off GNOME-3. The old Box, however, turns out to be back again *yay*!

Sadly, we will not represent GNOME at upcoming CeBIT. But we will at LinuxTag. Latest.

Anyway, during FOSDEM, we got a lot of questions about GNOME 3 and Ubuntu, i.e. will it be easily possible to run GNOME 3 on Ubuntu. I hope we can make it possible to have a smooth transition from Unity to GNOME Shell. Interestingly enough, there isn’t a gnome-shell package in the official natty repositories yet 🙁

It was especially nice to see and talk to old GNOME farts. And I enjoyed socialising with all the other GNOME and non-GNOME people as well. Sadly, I didn’t like the GNOME Beer Event very much because it was very hot in the bar so I left very quickly.

So FOSDEM was a success for GNOME I’d say. Let’s hope that future events will work at least as well and that we’ll have a strong GNOME representation even after the GNOME 3 release.

Critical Review of Tesseract

For CA640 we were supposed to pick a paper from International Conference of Software Engineering 2009 (ICSE 2009) and critically review it.

I chose to review Tesseract: Interactive Visual Exploration of Socio-Technical Relationships in Software Development.

You can find the review in PDF here. Its abstract reads:

This critical review of a paper, which presents Tesseract and was handed in for the ICSE 2009, focusses on
strength and weaknesses of the idea behind Tesseract: Visualising and exploring freely available and loosly coupled fragments (mailing lists, bug tracker or commits) of Free Software development.
Tesseract is thus a powerful data miner as well as a GUI to browse the obtained data.

This critique evaluates the usefulness of Tesseract by questioning the fundamental motivation it was built on, the data which it analyses and its general applicability.

Existing gaps in the original research are filled by conducting interviews with relevant developers as well as providing information about the internal structure of a Free Software project.

Tesseract is a program that builds and visualises a social network based on freely available data from a software project such as mailing lists, bug tracker or commits to a software repository. This network can be interactively explored with the Tesseract tool. This tool shows how communication among developers relates to changes in the actual code. The authors used a project under the GNOME umbrella named Rhythmbox to show their data mining and the program in operation. GNOME is a Free/Libre Software Desktop used as default by many Linux distributions including the most popular ones, i.e. Ubuntu and Fedora. To assess Tesseracts usability and usefulness, the authors interviewed people not related to Rhythmbox asking whether Tesseract was usable and provided useful information.

The paper was particularly interesting for me because the authors analysed data from the GNOME project. As I am a member of that development community, I wanted to see how their approach can or cannot increase the quality of the project. Another focus was to help their attempt to improve GNOME by highlighting where they may have gaps in their knowledge of its internals.

During this critique, I will show that some assumptions were made that do not hold for Free/Libre and Open Source Software (FLOSS) in general and for GNOME in particular either because the authors simply did not have the internal knowledge or did not research carefully enough. Also I will show that the used data is not necessarily meaningful and I will attempt to complement the lacking data by presenting the results of interviews I conducted with actual GNOME developers. This will show how to further improve Tesseract by identifying new usage scenarios. Lastly, this text will question the general usefulness of Tesseract for the majority of Free Software projects.