AppData status for January

So, it’s been a couple of months since my last post about AppData progress, so about time for one more. These are the stats for Fedora 21 in January (with the stats for Fedora 20 in November in brackets):

Applications in Fedora with long descriptions: 11% (up from 9%)
Applications in Fedora with screenshots: 9% (up from 7%)
Applications in GNOME with AppData: 53% (up from 50%)
Applications in KDE with AppData: 1% (unchanged)
Applications in XFCE with AppData: 0% (unchanged)

If you want to see what your application looks like, but don’t want to run gnome-software from Fedora rawhide or jhbuild, you can check the automatically-generated status page.

Some applications like 0ad and eog look great in the software center, but some like frogr and gbrainy just look sad. As always, full details about AppData here.

For artists, photographers and animators it’s often essential to be working with an accurately color calibrated screen. It’s also important to be able to print accurate colors being sure the hard copy matches what is shown on the display.

The OpenHardware ColorHug Colorimeter device provided an inexpensive way to calibrate some types of screen, and is now being used by over 2000 people. Due to limitations because of the low cost hardware, it does not work well on high-gamut or LED screen technologies which are now becoming more common.

ColorHug Spectro is a new device designed as an upgrade to the original ColorHug. This new device features a mini-spectroraph with UV switched illuminants. This means it can also take spot measurements of paper or ink which allows us to profile printers and ensure we have a complete story for color management on Linux.

I’m asking anyone perhaps interested in buying a device in about 9 months time to visit this page which details all the specifications so far. If you want to pre-order, just send us an email and we’ll add you to the list. If there isn’t at least 100 people interested, the project just isn’t economically viable for us as there are significant NRE costs for all the optics.

Please spread the word to anyone that might be interested. I’ve submitted a talk to LGM to talk about this too, which hopefully will be accepted.

Is PackageKit-hawkey now ready for primetime?

I’ve been using the hawkey backend on my Fedora 20 system for about 6 weeks now. In that time, I’ve found bugs in hawkey, librepo and even libsolv and I’d like to thank Michael, Tomas and Ales for all the help debugging and reviewing all the fixes. Of course, there were quite a few PackageKit bugs fixed too. So if you’re testing PackageKit-hawkey you really want to update to these packages:

Those updates are currently on their way to updates-testing, but will be in Fedora 20 in a few short days barring any last minute problems. I am now happy we can switch Fedora 21 to using hawkey by default, and reap the rewards of all the hard work put in by so many people over the last few months. I for one am really happy about the speed boost brought to all the applications using PackageKit.

On that note, happy Christmas everyone.

PackageKit on speed

I spent a few days last week optimising PackageKit. I first added a couple of huge 350ms+ optimisations when using Hawkey.  Then I turned my attention to the daemon itself and after adding a lot of profiling hooks to packagekitd, I recoiled in horror the amount of time it took to do simple things that everyone assumed would be fast.

A lot of unused functionality that was hurting transaction start times was removed. Certain core string functions were made fractions of ms faster and transactions a few hundreds of ms quicker in a few places, etc. The final result is that everything feels rather much speedier. Time-critical features like command-not-found and search-as-you-type now actually feel useful.

$ time pkcon search name powertop &> /dev/null
real0m0.082s

If you want to try out the new hotness, install the Fedora 20 update, enable the new hawkey backend and make sure you give karma. There’s also no more Zif backend in PackageKit, as hawkey is now faster and more reliable for all operations.

Testing the hawkey backend in Fedora 20

The grand plan is that Fedora is replacing yum with dnf in Fedora 21/22. For a few technical reasons PackageKit isn’t going to be using the python DNF layer, but instead using the main two libraries that DNF is build upon directly, namely hawkey (which in turn uses libsolv) and librepo.

I’ve been working with the hawkey and librepo developers on-and-off for a few months now, and we’ve now got a “hawkey” backend in PackageKit which I’ve been stress-testing every day for the last week or so. Today I released PackageKit 0.8.13 with all the fixes in the hawkey backend that make it, well, actually work correctly.

If you’d like to test out the backend, the procedure is pretty simple. Either wait for PackageKit-0.8.13-1.fc20 to hit updates-testing or manually download all the packages. Make sure you’ve updated to 0.8.13-1, and then install the PackageKit-hawkey subpackage and then remove the PackageKit-yum subpackage. If you don’t know how to do this you probably should stick to the tried and tested yum backend for now :)

Reboot, and then pkcon backend-details should tell you that you’re indeed running with the hawkey backend. The first transaction will take a little time as all the metadata will be downloaded and built into a .solv file, but after that it should be fine. From there, test offline updates, gnome-software and all the new stuff and file bugs with a way to reproduce and a backtrace if anything fails (and grab me on IRC if you can). Known issues is that installing and removing groups is not implemented, but that should only affect the old gpk-application application.

And the most important question… Is hawkey faster than yum? I’ll have to let the early adopters be the judge of that. :)

Offline Updates Performance Notes

So, after my epic 20+ minute offline update of 245 packages, last night I decided to look at some profiling numbers. All my testing was done using git master PackageKit (for the new strace support) on an otherwise unmodified Fedora 20 of a snapshot from last week. For the strace I chose to update two packages, otherwise the strace -tt output went maaaasive. Some salient points:

  • yum opens and closes the rpmdb 6692 times (that’s about 6690 more than it needs to) –we’re investigating why
  • fdatasync and fsync are killing us:

 

duration(ms) system call
805.749 fdatasync(17)
752.828 fsync(27)
658.659 fdatasync(9)
614.367 fdatasync(15)
598.182 fdatasync(33)
535.642 wait4(903, [{WIFEXITED(s) && WEXITSTATUS(s) =
423.247 wait4(911, [{WIFEXITED(s) && WEXITSTATUS(s) =
368.85 fsync(22)
309.556 stat(“/var/lib/yum/yumdb/g/gvfs-fuse-1.18.3-1.fc20-x86_64/checksum_type”
217.877 fdatasync(18)
179.002 close(23)

The full strace log is here (warning, huge) if you’re interested. I’ve got some other work to be doing today, but I’ll continue to work on this at the weekend.

Offline Updates in Fedora 20

In GNOME 3.10 we’re encouraging more people to use the offline-update functionality which we’ve been using in Fedora for a little while now. A couple of people have told me it’s really slow, but I hadn’t seen an offline update take more than a minute or so as I test updates all the time. To reproduce this, I spun up a seldom-used Fedora 20 alpha image and let GNOME download and prepare all the updates in the background. I then added some profiling code to the pk-offline-update binary, and rebooted. The offline update took almost 17 minutes to run.

So, what was it doing all that time, considering that we’ve already downloaded the packages and depsolved the transaction:

Transaction Phase Time (s)
Start up PackageKit 0.3
Starting up yum 3
Depsolving 10
Signature Check 8
Test Commit 5
Install new packages 704
Remove old packages 168
Run post-install scripts 90

This is about an order of magnitude slower than what I expected. Some of my observations:

  • 10 seconds to depsolve an already depsolved transaction
  • 8 seconds to check a few hundred signatures
  • 168 seconds just to delete a few thousand files
  • over 10 minutes to install a few hundred RPMs seems crazy
  • 90 seconds to rebuild a few indexes seems like a huge amount of time

Some notable offenders:

Package Time to install (s)
selinux-policy-targeted 122
kernel-devel 25
libreoffice-core 21
selinux-policy 17
hugin 12

 

Package Time to cleanup (s)
gramps 11
wireshark-gnome 8
hugin 7
meld 6
control-center 5

Hopefully Fedora 21 will move to the hawkey backend, and we can get closer to raw librpm speed (which seems to be quite a speed boost) but even that is too slow. I’ll be looking into the individual packages this week, and trying to find what makes them so slow, and what we can do about them to speed things up.

Upstream adoption of AppData so far

By popular request, some update on the upstream adoption of AppData so far:

Applications in Fedora with long descriptions: 168 (9%)
Applications in Fedora with screenshots: 140 (7%)
Applications in GNOME with AppData: 60 (50%)
Applications in KDE with AppData: 1 (1%)
Applications in XFCE with AppData: 0 (0%)

You can look at a few ways:

  • We’ve made significant progress in the last year-or-so and many popular applications are already shipping the extra data.
  • There are a lot of situations where the upstream authors do not know what an AppData file is, don’t have time to add one, or simply do not care.
  • GNOME is clearly ahead of KDE and XFCE, probably because of the existing GNOME Goal and my nag emails to the desktop-devel mailing list. A little thing to bear in mind is that Apper (the KDE application installer) can also make use of the AppStream data, so this is a little disappointing for KDE users who probably don’t see any difference at the moment.

So where do we go from here? Clearly KDE and XFCE have some catching up to do, and I need someone familiar with those communities to lead this effort. There is also a huge number of upstreams that need a little push in the right direction, and I’ve been trying to do that for the last couple of months. Without help, this would be a never-ending battle for me. A little reminder: In GNOME 3.12 we are penalising applications that don’t ship AppData by including them lower in the search results, and in GNOME 3.14 we’re not going to be showing them at all.

If you’re interested to see all the applications shown by default in Fedora 20, I’ve put together this page showing a quick overview. If you see anything there that shouldn’t be an application and needs blacklisting, just let me know. If you see an application you care about without a long description or screenshots, then please file a bug upstream pointing them at the AppData specification page. Thanks.

How to generate AppStream metadata for Fedora

I’m generating all the Fedora AppStream metadata by hand at the moment. Long term this is going to move to koji, but since we’re still tweaking the generator, adding features and fixing bugs it seems too early to fully integrate things. This is fine if you just care about the official Fedora sources, but a lot of people want to use applications from other less, ahem, free repos.

If you manage a repository and want to generate AppStream metadata yourself it’s really quite easy if you follow these instructions, although building the metadata can take a long time. Lets assume you run a site called MegaRpms and you want to target Fedora 20.

First, checkout the latest version of fedora-appstream and create somewhere we can store all the temporary files. You’ll want to do this on a SSD if possible.

$ mkdir megarpms
$ cd megarpms

Then create a project file with all the right settings for your repo. Lets assume you have two seporate trees, ‘megarpms’ and ‘meagarpms-updates’.

$ cat project.conf
[AppstreamProject]
DistroTag=f20
RepoIds=megarpms,megarpms-updates
DistroName=megarpms-20
ScreenshotMirrorUrl=http://www.megarpms.org/screenshots/

The screenshot mirror URL is required if you want to be able to host screenshots for applications. If you don’t want to (or can’t afford the hosting costs) then you can comment this out and no screenshots will be generated.

Then we can actually download the packages we need to extract. Ensure that both megarpms and megarpms-updates are enabled in /etc/yum.conf.d/ and then start downloading:

$ sudo ../fedora-download-cache.py

This requires root as it uses and updates the system metadata to avoid duplicating the caches you’ve probably already got. After all the interesting packages are downloaded you can do:

$ ../fedora-build-all.py

Now, go and make a cup of tea and wait patiently if you have a lot of packages to process. After this is complete you can do:

$ ../fedora-compose.py

This spits out megarpms-20.xml.gz and megarpms-20-icons.tar.gz — and you now have two choices what to do with these files. You can either upload them with the rest of the metadata you ship (e.g. in the same directory as repomd.xml and primary.sqlite.bz2) which will work with Fedora 21 and higher.
For Fedora 20, you have to actually install these files, so you can do something like this in the megarpms-release.spec file:

Source1: http://www.megarpms.org/temp/megarpms-20.xml.gz
Source2: http://www.megarpms.org/temp/megarpms-20-icons.tar.gz
mkdir -p %{buildroot}%{_datadir}/app-info/xmls
cp %{SOURCE1} %{buildroot}%{_datadir}/app-info/xmls
mkdir -p %{buildroot}%{_datadir}/app-info/icons/megarpms-20
tar xvzf %{SOURCE2}
cd -

This ensures that gnome-software can access both data files when starting up. If you have any other questions, concerns or patches, please get in touch. This is all very Fedora specific (rpm files, Yum API, various hardcoded package names) but if you’re interested in using fedora-appstream on your distro and want to actually do the work I’d welcome patches to make it less fedora-centric. SUSE generates the AppStream files in a completely different way.