December 2013 – Tristan's World

Application Bundles Revisited

This is a follow up post on the one I made earlier this week on Glade application bundles.

What I had hoped to be a simple bundling experience turned out to actually take me all week to complete, but hey, you get what you pay for. And I’m very happy with the outcome so it was well worth the effort.

The new test bundle can be found here, and I’m very proud to announce that we require exactly the 2.7 ABI form glibc.

We still require fontconfig/freetype/Xlib libraries to be present on the target host to run what is inside the bundle, and we still require libz and a moderately old version of libglib to actually unpack the bundle.

For the AppImageKit glib & zlib dependencies, I plan to make those static dependencies so that they will not even be required, but first I must do battle with CMake to make that happen.

How can I use this to bundle my GTK+ application ?

This is a question I’ve been expecting to hear (and have been hearing). As the last iteration was only a spike, it involved a very ‘LFS’ technique of manually preparing the bundle, but that has been fixed this week and now you can use the bundling mechanism with a collection of customized jhbuild scripts.

Here is an outline of what we have in our build/linux/ directory, which can serve as a good template / starting point for others to bundle their application:

README: A step by step instructions for generating the bundle
jhbuildrc: A custom jhbuildrc for building the bundle
AppRun: A script to run your GTK+ app from inside the bundled environment
PrepareAppDir.sh: A script to post-process the built stack including GTK+, it’s dependencies and your app
LibcWrapGenerator.vala: A program for generating libcwrap.h
modulesets/bundle.modules: A self-contained moduleset for building the stack
modulesets/patches/: A few downstream patches needed to build/run the bundle
triggers/: Some custom jhbuild post-installation triggers

People are welcome to copy this build/linux subdirectory into their project and try to build their own GTK+ based application bundles.

Different applications will have different requirements, you will certainly need to modify AppRun and bundle.modules.

Hand to hand combat with glibc

Most of my work this week has revolved around doing hand to hand combat with glibc. I do feel a bit resentful that gcc does not provide any -target-abi-version=glibc,2.7 sort of command line option, as this work would be much better suited to the actual compiler and linker, however with libcwrap.h and it’s generator, I’ve emerged victorious from this death match.

The technique employed to get the whole stack to depend on glibc 2.7 ABI involves using the .symver directive like so:

__asm__(".symver memcpy, memcpy@GLIBC_2.2.5");

The symbol memcpy() changed in libc’s 2.14 ABI to behave more like memmove(), however linking to the new symbol causes your application to require libc 2.14 ABI, which of course we don’t want. The above .symver directive ensures that you will link to the version of memcpy() which is provided by the glibc 2.2.5 ABI.

However the problem is more intense than just this, there are many symbols in libc with various versions and what we want is to be able to simply choose a target version. The LibcWrapGenerator.vala program takes care of this.

Running the generator on a given platform will call objdump on your system libc libraries and inspect which symbols exist, which of them can be safely accessed without any .symver directive, which symbols can be redirected to their most recent version before your arbitrary target ABI version and finally, of course, which ones simply are not available, for instance the fallocate symbol:

__asm__(".symver fallocate, fallocate@GLIBC_DONT_USE_THIS_VERSION_2.10");

The fallocate symbol was actually initially introduced in the 2.10 ABI, and simply cannot be used when targeting the glibc 2.7 ABI.

The nice thing about the above directive, is that it will cause an informative linker error when you try to link any program using fallocate() against glibc. Some of my patches work around these problems (for instance cairo uses longjump() in it’s test cases, so my patch simply disables these test cases when building for the bundle), in other cases this is totally transparent. For the case of fallocate above; libglib will try to use it if it’s available, but with the forced include of the generated “libcwrap.h” file, glib simply fails to find fallocate at configure time and happily functions without it.

This technique was borrowed from the now defunct autopackage project which used a bash script with awk to generate the header file in question, however this script stopped working for glibc runtimes >= 2.10. A newer port of that old script to vala was created by Jan Niklas Hasse in 2011 which was a good starting point, but still failed to generate the correct output. Today I put my (non-existent) vala skills to the test and now have a working version of this nifty header generator, salvaged from the autopackage wreckage.

More Testing !

I will of course appreciate any more testing of the new bundle, for any systems with the glibc 2.7 ABI or later. We did prove that the bundle works properly for a CentOS 6 system which is already a good start.

I would be interested however to know if other people can build the bundle following my README file in our build/linux directory, specifically I’m looking for cases where you have a very new glibc ABI, generate a new libcwrap.h, and let’s see if the bundling process still works, and also let’s see if we can run the bundled glade on very old systems, generated from very new systems.

That’s all, I think I’ve covered all the ground I intended to in this post… so please try this at home ! 🙂

And enjoy bundling your own GTK+ applications 🙂

Application Bundles for Glade

This week started out with a search for some mechanism for building portable bundles for 64bit GNU/Linux systems. I looked at various bundler implementations, 0install, chakra, Alexander Larsson’s glick and glick2 to name a few of the implementations I found. Finally I found this gem called AppImageKit by Simon Peter (I refer to my own fork of his repository because it contains some patches I needed, however there is a pull request, which I think is the custom when doing things on github.com).

Before I continue, I should make clear what were my motivations and why I chose AppImageKit over any other existing options.

Why Bundled Applications ?

Bundled applications is a concept prominent in OSX and has been considered and reconsidered a lot by the general GNOME/GNU/Linux communities, I refer to Alex’s blog for some ideas on the concepts of bundling applications, you may also want to read this article by Lennart Poettering on the subject of Sandboxed Applications for GNOME (if you haven’t already seen his talk).

So first let me explain the motivations behind my hacking:

I wanted to make bleeding edge versions of Glade available to a wider user base. While many serious developers will naturally have a copy of GTK+ master from git.gnome.org built locally (as naturally you target next generation software for the apps you create), we still have many users who are just not into creating a relocated build system and building everything themselves.
Pretty much the same as the above point, I don’t want users to have to lag 3 years behind our development just because they decided to run a stable operating system (this should be a no brainer, really).
Finally, I wanted a method of distributing portable binaries of proprietary applications, without the fuss of wallowing around in the muck of creating rpms and debian packages for every different system out there, I want to just create one bundle which will run pretty much everywhere with X11 and 64bit linux. Yes, time to throw eggs and tomatoes at me in public. Just because I’ve contributed lots and lots of my own time to work on free software for ‘free as in beer’, and have been a proponent of Free Software for my own idealistic reasons (I just believe that code developed in public is held to a higher standard of quality), doesn’t mean that all the software I write is going to be free software.

Why did I choose AppImageKit ?

If you’re already familiar with the scene of Application Bundles out there, then you’ve already probably guessed why I made my choice.

In my search for application bundling mechanisms out there, I found a few options (as I mention in the beginning of this post). I quickly realized that most of the projects I found were either targeting a specific OS/distribution (i.e. chakra) or at least required the user to install some kind of mechanism to run the bundle (i.e. 0install). While I really do respect and admire the work done by Alexander and the advocacy by Lennart, pushing for a new concept of packaging in GNU/Linux systems, my requirements were simply different (and perhaps once sandboxed applications for GNOME actually exist, I will be able to switch to that new mechanism).

My requirement is, again, to be able to download and run an application on your 64bit GNU/Linux platform, I don’t care if it’s Fedora, GNOME OS, Debian, Arch Linux, Ubuntu, Chakra, I just wanted it to run.

How does it work ?

After reading the first paragraphs of the AppImageKit documentation I was sold. This is a simple and nifty idea of simply creating a compressed ISO image with an ELF binary header, it uses libz to decompress itself and fuse to mount itself into a temporary directory, and then it just runs a script, letting you modify your environment and launch your App from within the unpacked bundle environment (actually the AppRunScript.sh is my own addition to AppImageKit). It also uses libglib for some obscure reason, so the best practice is to build the bundling mechanism with the oldest version of libglib possible (as this will become a base runtime requirement for the user).

So basically the requirements are:

C runtime libraries
libfuse, with a fuse capable kernel
libz
libglib (as old as possible)

After that, the system requirements depend entirely on what you put in the bundle.

Please test my Glade bundle !

After 2 days of hacking I was finally able to get a GTK+ application running (Glade) with the Adwaita theme fully functional, and self contained. This involved much ldd and strace in the bundle environment, ensuring that the runtime doesn’t touch any system libraries or auxiliary data, the discovery of this nifty chrpath tool, and much grepping through various sources and figuring out how to properly relocate these modules with environment variables. I had to strangle Pango, until it finally yielded, after I thoroughly deformed pango’s face with this downstream patch. I also needed to rediscover hicolor-icon-theme’s location on fdo, since everything breaks down without an index.theme file in the right place (one of the various things jhbuild forgets to do properly).

You might be interested to peek at the README which is a thorough breakdown on how to create portable bundles of Glade in the future, the application startup script is also of particular interest as it shows what environment variables you really need to relocate GTK+ apps.

Finally, I was able to produce this bundle.

In addition to the AppImageKit system requirements listed above, we require:

X11 libraries
fontconfig
freetype

(Technically these could also be included in the bundle, it would require that we install fonts as well, and I just wanted to avoid that telling myself that most GNU/Linux distributions have this already).

The above bundle should be able to ‘just run’, just download it and run it without any special privileges… PLEASE !

We will greatly appreciate any testing on various 64bit GNU/Linux platforms, after some testing we will start to advertise downloadable bundles of bleeding edge Glade for any users of stable GNU/Linux distributions.

And of course, Enjoy !

Smart & Fast Addressbooks

This post is a sort of summary of the work we’ve done on the Addressbook for Evolution Data Server this year at Openismus. As I demonstrated in my previous post, Evolution’s addressbook is now riddled with rich locale sensitive features so I won’t cover the sorting features, you can refer to the other post for that.

Understanding Phone Numbers

This is another really nice feature which I admit has been driving me up the wall for the last year. I’ll try to sum it up one last time here so hopefully I don’t have to ever concern myself too much with the topic again 😉

Before I get into it, I should start with describing the problem which we’ve addressed (I can’t say solved as I don’t believe there really is any ultimate solution for this problem).

So, this problem is particular to the implementation of a hand phone device, which implies the following conditions:

Users will enter any kind of data into the addressbook and call it a phone number. This most typically involves phone numbers entered as local numbers, sometimes it includes fully qualified international numbers for contacts who live abroad, and can also include text such as “ask jenny for her number next time”, as a rule, anything goes.
Addressbooks are typically a collection of vCards, this is a point of interest as using a standard format for contacts allows one to send vCards back and forth, which means that you cannot consider yourself in control of the data you store on your device. Instead a vCard can come from a synchronization of contacts from your laptop’s email client, or passed over bluetooth, the vCard can come from anywhere, containing any data it pleases and calling that a phone number.
When initiating an outbound call, the cellular modem firmware will send a string of numbers to the carrier, this will either succeed or fail. It’s tempting at this stage to consider and store the result of this operation, but unclear if the modem firmware will tell you the fully qualified number of the successful outbound call, and unreasonable to attempt a call just to determine such a thing.
When the firmware announces an incoming call, there will be a fully qualified E.164 phone number available (E.164 is a standard international phone number format).

Two obvious use cases now present a difficulty:

Let’s say the user starts entering a phone number somewhere in the UI, perhaps with the intent of initiating a phone call, or just with the intent of searching for a contact. We know that the user entered ‘just about anything’ as a phone number, we know that the vCard might have come from an external source (the same user might not have entered the phone number to begin with), and of course, we want to find the correct contact, and we want to find that contact right now.
Let’s imagine also, just for a moment, that your hand phone implementation might actually receive a phone call (not all of your users are so popular that they actually receive calls, but this is, after all, what your device is for ;-)). So now we have access to a fully qualified E.164 number for the incoming call, and an addressbook full of these vCards, which have, ya know, whatever the user decided to enter as a phone number. We of course want to find the single unique matching contact for that incoming phone number, and we want it even more righter and nower than in the other use case listed above (just in case, you know designers, maybe they want something crazy like displaying the full name of the contact who’s calling you instead of the phone number).

A common trick of the trade to handle these cases is to perform a simple heuristic which involves normalizing the number in your vCards (to strip out spaces and characters such as ‘-‘ and ‘.’) and then perform a simple suffix match against the incoming caller’s fully qualified phone number. This was admittedly a motivation behind our implementation of optimized suffix matching last year.

But let’s pretend that you’re not satisfied with this heuristic, you know you can do better, and you want to impress the crowd. Well now you can do just that ! assuming of course that you use Evolution’s addressbook in your device architecture (but of course you do, what else would you use ?).

Locale sensitive phone number interpretations

Again the International Components for Unicode (ICU) libraries come to the rescue, this time with the libphonenumber helper library on google code, which is now an optional dependency for Evolution Data Server (my painful experience compiling this heap of libphonenumber code begs me to make an unrelated comment: somebody school these guys in the ways of Automake, this CMake experience is just about the worst offence you can make to a downstream packager, or someone like me, who just needs to build it).

So what is a locale sensitive interpretation of a phone number ? Why do we need that ?

Some countries have different call prefixes for outgoing calls to leave the country. I believe this is ‘0’ in North America but it can vary. Even more confusing, take a country like Brazil with many competing cellular carriers, in the case of Brazil you have multiple valid calling prefixes to place a call outside of the country, each one with a different, competing rate.
Some locales, like in all locale sensitive affairs, have different preferences about how a phone number is composed, i.e. do we use a space or a dash or a decimal point as a separator ? do we interpret a trailing number preceded by a decimal as an extension code ? or as the final component of a local phone number ?
Some locales permit entirely different character sets to be used for numbers, and users might very well prefer to enter phone numbers in their native language script rather than entering standard numeric characters.

Using libphonenumber allows us to make the best possible guess of what the fully qualified E.164 number would be for an arbitrary string entered by the user in a given locale. It also provides us with useful information such as whether the phone number contained an extension code, whether the number had a leading 0 (special case for Italy) and whether the parsed phone number string had a country code. Actually libphonenumber will also make a guess at what the country code would be, but we’re not interested by that feature.

To leverage this feature in Evolution’s addressbook, one should make use of the following APIs:

The function e_phone_number_is_supported() can be called at runtime to determine if your installation of Evolution Data Server was compiled with support for phone numbers (as libphonenumber is only a soft dependency, this should be checked).
When creating a new addressbook, the ESourceBackendSummarySetup should be used to configure the E_CONTACT_TEL field into the addressbook ‘quick search’ configuration, and it should be configured with the special index E_BOOK_INDEX_PHONE so that interpreted phone numbers will be stored in SQLite for quick phone number matching.
Whether your addressbook is configured for quick phone number searches or not, so long as phone numbers are supported in your build then you have the capacity of leveraging the phone number matching semantics in the regular way with EBookQuery.

Three queries exist which can be used for phone number matching, at differing match strengths.

A typical query that you would use to search for, say equality at the national number strength level, would look like this:

EBookQuery *query = e_book_query_field_test (E_CONTACT_TEL,
                                             E_BOOK_QUERY_EQUALS_NATIONAL_PHONE_NUMBER,
                                             "user entered, or E.164 number");

With the above query one can use the regular APIs to fetch a contact list, filter a view, or filter a cursor.

Addressbooks on Crystal Meth

One thing that is really important of course in contacts databases, is speed, lots and lots of speed 🙂

While last year we’ve made considerable improvements, and already gained much speed for the addressbook, now contacts are inserted, deleted, modified and as specially fetched righter and nower than ever.

I should start by mentioning that in the last two weeks I’ve committed a complete rewrite of the old SQLite code which handles addressbooks, finally overcoming this bug, and turning what used to look like this (shudders) into something much more intelligible (sigh of relief). The net results in performance, can be observed here.

While I’d love to go over the vast improvements I’ve made in the code, and new features added, let’s just go through some highlights of the new benchmarks as I’ve been writing this post for a few hours already 😉

First some facts about the addressbook configuration, since that makes a great difference in the meaning of the benchmark output.

E_CONTACT_GIVEN_NAME – Forced fallback search routines (no index)
E_CONTACT_FAMILY_NAME – Prefix Index, Suffix Index
E_CONTACT_EMAIL – Prefix Index, Suffix Index
E_CONTACT_TEL – Prefix Index, Suffix Index

The red line in the benchmarks is what is now in EDS master (although marked as “Experimental” in the charts). The comparisons are based on addressbooks opened in Direct Read Access mode, i.e. books are opened with e_book_client_connect_direct().

And now some of the highlights:

contact-saving — Total time to save all contacts, with varying numbers of contacts to add.

In the above chart we’re basically seeing the time we save by using prepared statements for batch inserts, instead of generating the same query again and again.

Prefix searches on contact fields which have multiple values (stored in a separate table) such as email addresses and phone numbers, now have good performance for addressbooks containing over 200,000 contacts (I was unable to test 409,600 contacts, as the benchmarks themselves require a lot of memory to keep a set of vCards and EContacts in memory).

filter-by-long-given-name-prefix — Fallback routine performed on the prefix searches of the given name

Here we see no noticeable change in the speed of fetching contacts with the fallback routine (i.e. no data to compare in SQLite columns, vCard parsing involved).

This is interesting only because in my new code base, we no longer fetch all contacts into memory and compare the list one by one, but instead install a function into SQLite to perform the fallback routine over the course of a full table scan. This is much more friendly to your memory consumption, and comes at no decrease in the performance of queries which hit the fallback routine.

Resident memory usage over the course of the benchmark progress

Here we see the memory usage of the entire benchmark process including the evolution-addressbook-factory process over the course of the benchmarks (each dot from left to right is a measurement taken after each benchmark runs). Note that we have 200,000 contacts and vCards loaded into memory for the sake of the benchmarks to verify results for each speed test (hence the initial spikes at the beginning of the benchmark progress).

The noticeable drop in memory usage can be attributed to how the new backend is more friendly on system memory when performing fallback search routines.

Take a look at the full benchmarks including the csv file here. I should note that the ‘fetch-all-contacts.png’ benchmark indicated a bug where I was not detecting a special case (a wildcard query which should simply list all results), that has been fixed and the benchmark for it doesn’t apply with current Evolution Data Server master.

Anyway, it’s late and time for pizza 🙂

I hope you’ve all enjoyed the show !