Technical Blog of Richard Hughes

AppStream Logs, False Positives and You

Quite a few people have asked me how the AppStream distro metadata is actually generated for thier app. The actual extraction process isn’t trivial, and on Fedora we also do things like supply missing AppData files for some key apps, and replacing some upstream screenshots on others.

In order to make this more transparent, I’m going to be uploading the logs of each generation run. If you’ve got a few minutes I’d appreciate you finding your application there and checking for any warnings or errors. The directory names are actually Fedora package names, but usually it’s either 1:1 or fairly predictable.

If you’ve got a application that’s being blacklisted when it shouldn’t be, or a GUI application that’s in Fedora but not in that list then please send me email or grab me on IRC. The rules for inclusion are here. Thanks.

Announcing Appstream-Glib

For a few years now Appstream and AppData adoption has been growing. We’ve got client applications like GNOME Software consuming the XML files, and we’ve got several implementations of metadata generators for a few distros now. We’ve also got validation tools we’re encouraging upstream applications to use.

The upshot of this was the same code was being duplicated across 3 different projects of mine, all with different namespaces and slightly different defined names. Untangling this mess took a good chunk of last week, and I’ve factored out 2759 lines of code from gnome-software, 4241 lines from createrepo_as, and the slightly less impressive 178 lines from appdata-tools.

The new library has a simple homepage, and so far a single release. I’d encourage people to check this out and provide early comments, as as soon as gnome-software branches for 3-12 I’m going to switch it to using this. I’m also planning on switching createrepo_as and and appdata-tools for the next releases too so things like jhbuild modulesets need to be updated and tested by somebody.

Appstream-Glib 0.1.0 provides just enough API to make sense for a first release, but I’m going to be continuing to abstract out useful functionality from the other projects to share even more code. I’ve spent a few long nights profiling the XML parsing code, and I’m pleased to say the load time of gnome-software is 160ms faster with this new library, and createrepo_as completes the metadata generation 4 minutes faster. Comments, suggestions and patches very welcome. There’s a Fedora package linked from the package review bug if you’d rather test that. Thanks.

GNOME 3.12 on Fedora 20

I’ve finished building the packages for GNOME 3.11.90. I’ve done this as a Fedora 20 COPR. It’s probably a really good idea to test this in a VM rather than your production systems as it’s only had a small amount of testing.

If it breaks, you get to keep all 132 pieces. It’s probably also not a good idea to be asking fedora-devel or fedoraforums for help when using these packages. If you don’t know how to install a yum repo these packages are not for you.

Comments and suggestions, welcome. Thanks.

AppData status for January

So, it’s been a couple of months since my last post about AppData progress, so about time for one more. These are the stats for Fedora 21 in January (with the stats for Fedora 20 in November in brackets):

Applications in Fedora with long descriptions: 11% (up from 9%)
Applications in Fedora with screenshots: 9% (up from 7%)
Applications in GNOME with AppData: 53% (up from 50%)
Applications in KDE with AppData: 1% (unchanged)
Applications in XFCE with AppData: 0% (unchanged)

If you want to see what your application looks like, but don’t want to run gnome-software from Fedora rawhide or jhbuild, you can check the automatically-generated status page.

Some applications like 0ad and eog look great in the software center, but some like frogr and gbrainy just look sad. As always, full details about AppData here.

For artists, photographers and animators it’s often essential to be working with an accurately color calibrated screen. It’s also important to be able to print accurate colors being sure the hard copy matches what is shown on the display.

The OpenHardware ColorHug Colorimeter device provided an inexpensive way to calibrate some types of screen, and is now being used by over 2000 people. Due to limitations because of the low cost hardware, it does not work well on high-gamut or LED screen technologies which are now becoming more common.

ColorHug Spectro is a new device designed as an upgrade to the original ColorHug. This new device features a mini-spectroraph with UV switched illuminants. This means it can also take spot measurements of paper or ink which allows us to profile printers and ensure we have a complete story for color management on Linux.

I’m asking anyone perhaps interested in buying a device in about 9 months time to visit this page which details all the specifications so far. If you want to pre-order, just send us an email and we’ll add you to the list. If there isn’t at least 100 people interested, the project just isn’t economically viable for us as there are significant NRE costs for all the optics.

Please spread the word to anyone that might be interested. I’ve submitted a talk to LGM to talk about this too, which hopefully will be accepted.

Is PackageKit-hawkey now ready for primetime?

I’ve been using the hawkey backend on my Fedora 20 system for about 6 weeks now. In that time, I’ve found bugs in hawkey, librepo and even libsolv and I’d like to thank Michael, Tomas and Ales for all the help debugging and reviewing all the fixes. Of course, there were quite a few PackageKit bugs fixed too. So if you’re testing PackageKit-hawkey you really want to update to these packages:

Those updates are currently on their way to updates-testing, but will be in Fedora 20 in a few short days barring any last minute problems. I am now happy we can switch Fedora 21 to using hawkey by default, and reap the rewards of all the hard work put in by so many people over the last few months. I for one am really happy about the speed boost brought to all the applications using PackageKit.

On that note, happy Christmas everyone.

PackageKit on speed

I spent a few days last week optimising PackageKit. I first added a couple of huge 350ms+ optimisations when using Hawkey. Then I turned my attention to the daemon itself and after adding a lot of profiling hooks to packagekitd, I recoiled in horror the amount of time it took to do simple things that everyone assumed would be fast.

A lot of unused functionality that was hurting transaction start times was removed. Certain core string functions were made fractions of ms faster and transactions a few hundreds of ms quicker in a few places, etc. The final result is that everything feels rather much speedier. Time-critical features like command-not-found and search-as-you-type now actually feel useful.

$ time pkcon search name powertop &> /dev/null
real0m0.082s

If you want to try out the new hotness, install the Fedora 20 update, enable the new hawkey backend and make sure you give karma. There’s also no more Zif backend in PackageKit, as hawkey is now faster and more reliable for all operations.

Testing the hawkey backend in Fedora 20

The grand plan is that Fedora is replacing yum with dnf in Fedora 21/22. For a few technical reasons PackageKit isn’t going to be using the python DNF layer, but instead using the main two libraries that DNF is build upon directly, namely hawkey (which in turn uses libsolv) and librepo.

I’ve been working with the hawkey and librepo developers on-and-off for a few months now, and we’ve now got a “hawkey” backend in PackageKit which I’ve been stress-testing every day for the last week or so. Today I released PackageKit 0.8.13 with all the fixes in the hawkey backend that make it, well, actually work correctly.

If you’d like to test out the backend, the procedure is pretty simple. Either wait for PackageKit-0.8.13-1.fc20 to hit updates-testing or manually download all the packages. Make sure you’ve updated to 0.8.13-1, and then install the PackageKit-hawkey subpackage and then remove the PackageKit-yum subpackage. If you don’t know how to do this you probably should stick to the tried and tested yum backend for now :)

Reboot, and then pkcon backend-details should tell you that you’re indeed running with the hawkey backend. The first transaction will take a little time as all the metadata will be downloaded and built into a .solv file, but after that it should be fine. From there, test offline updates, gnome-software and all the new stuff and file bugs with a way to reproduce and a backtrace if anything fails (and grab me on IRC if you can). Known issues is that installing and removing groups is not implemented, but that should only affect the old gpk-application application.

And the most important question… Is hawkey faster than yum? I’ll have to let the early adopters be the judge of that. :)

Offline Updates Performance Notes

So, after my epic 20+ minute offline update of 245 packages, last night I decided to look at some profiling numbers. All my testing was done using git master PackageKit (for the new strace support) on an otherwise unmodified Fedora 20 of a snapshot from last week. For the strace I chose to update two packages, otherwise the strace -tt output went maaaasive. Some salient points:

yum opens and closes the rpmdb 6692 times (that’s about 6690 more than it needs to) –we’re investigating why
fdatasync and fsync are killing us:

duration(ms)	system call
805.749	fdatasync(17)
752.828	fsync(27)
658.659	fdatasync(9)
614.367	fdatasync(15)
598.182	fdatasync(33)
535.642	wait4(903, [{WIFEXITED(s) && WEXITSTATUS(s) =
423.247	wait4(911, [{WIFEXITED(s) && WEXITSTATUS(s) =
368.85	fsync(22)
309.556	stat(“/var/lib/yum/yumdb/g/gvfs-fuse-1.18.3-1.fc20-x86_64/checksum_type”
217.877	fdatasync(18)
179.002	close(23)

The full strace log is here (warning, huge) if you’re interested. I’ve got some other work to be doing today, but I’ll continue to work on this at the weekend.

Offline Updates in Fedora 20

In GNOME 3.10 we’re encouraging more people to use the offline-update functionality which we’ve been using in Fedora for a little while now. A couple of people have told me it’s really slow, but I hadn’t seen an offline update take more than a minute or so as I test updates all the time. To reproduce this, I spun up a seldom-used Fedora 20 alpha image and let GNOME download and prepare all the updates in the background. I then added some profiling code to the pk-offline-update binary, and rebooted. The offline update took almost 17 minutes to run.

So, what was it doing all that time, considering that we’ve already downloaded the packages and depsolved the transaction:

Transaction Phase	Time (s)
Start up PackageKit	0.3
Starting up yum	3
Depsolving	10
Signature Check	8
Test Commit	5
Install new packages	704
Remove old packages	168
Run post-install scripts	90

This is about an order of magnitude slower than what I expected. Some of my observations:

10 seconds to depsolve an already depsolved transaction
8 seconds to check a few hundred signatures
168 seconds just to delete a few thousand files
over 10 minutes to install a few hundred RPMs seems crazy
90 seconds to rebuild a few indexes seems like a huge amount of time

Some notable offenders:

Package	Time to install (s)
selinux-policy-targeted	122
kernel-devel	25
libreoffice-core	21
selinux-policy	17
hugin	12

Package	Time to cleanup (s)
gramps	11
wireshark-gnome	8
hugin	7
meld	6
control-center	5

Hopefully Fedora 21 will move to the hawkey backend, and we can get closer to raw librpm speed (which seems to be quite a speed boost) but even that is too slow. I’ll be looking into the individual packages this week, and trying to find what makes them so slow, and what we can do about them to speed things up.