Please welcome Star Labs to the LVFS

A few weeks ago Sean from Star Labs asked me to start the process of joining the LVFS. Star Labs is a smaller Linux-friendly OEM based just outside London not far from where I grew up. We identified three types of firmware we could update, the system firmware, the EC firmware and the SSD firmware. The first should soon be possible with the recent pledge of capsule support from AMI, and we’ve got a binary for testing now. The EC firmware will need further work, although now I can get the IT8987 into (and out of) programming mode. The SSD firmware needed a fix to fwupd (to work around a controller quirk), but with the soon-to-be released fwupd it can already be updated:

Sean also shipped me some loaner hardware that could be recovered using manufacturing tools if I broke it, which makes testing the ITE EC flashing possible. The IT89 chip is subtly different to the IT87 chip in other hardware like the Clevo reference designs, but this actually makes it easier to support as there are fewer legacy modes. This will be a blog post all of it’s own.

In playing with hardware intermittently for a few weeks, I’ve got a pretty good feel for the “Lap Top” and “Star Lite” models. There is a lot to like, the aluminium cases feel both solid and tactile (like an XPS 13) and it feels really “premium” unlike the Clevo reference hardware. Star Labs doesn’t use the Clevo platform any more, which allows it to take some other bolder system design choices. Some things I love: the LED IPS screen, USB-C charging, the trackpad and keyboard. The custom keyboard design is actually a delight to use; I actually prefer it to my Lenovo P50 and XPS 13 for key-placement and key-travel. The touchpad seems responsive, and the virtual buttons work well unlike some of the other touchpads from other Linux-friendly OEMs. The battery life seems superb, although I’ve not really done enough discharge→charge→discharge cycles to be able to measure it properly. The front-facing camera is at the top of the bezel where it belongs, which is something the XPS 13 has only just fixed in the latest models. Nobody needs to see up my nose.

There are a few things that could be improved with the hardware in my humble opinion: The shiny bezel around the touchpad is somewhat distracting on an otherwise beautifully matte chassis. There is also only a microSD slot, when all my camera cards are full sized. The RAM is soldered in, and so can’t be upgraded in the future, and the case screws are not “captive” like the new Lenovos. It also doesn’t seem to have a ThunderBolt interface which might matter if you want to use this thing docked with a bazillion things plugged in. Some of these are probably cost choices, the Lap Top is significantly cheaper than the XPS 13 developer edition I keep comparing it against in my head.

I was also curious to try the vendor-supplied customized Ubuntu install which was supplied with it. It just worked, in every way, and for those installing other operating systems like Fedora or Arch all the different distros have been pre-tested with extra notes – a really nice touch. This is where Star Labs really shine, these guys really care about Linux and it shows. I’ve been impressed with the Lab Top and I’ll be sad to return it when all the hardware is supported by fwupd and firmware releases are available on the LVFS.

So, if you’re using Star Drive hardware already then upgrade fwupd to the latest development version, enable the LVFS testing remote using fwupdmgr enable-remote lvfs-testing and tell us how the process goes. For technical reasons you need to power down the machine and power it back up rather than just doing a “warm” reboot. In a few weeks we’ll do a proper fwupd release and push the firmware to stable.

Phoenix joins the LVFS

Just like AMI, Phoenix is a huge firmware vendor, providing the firmware for millions of machines. If you’re using a ThinkPad right now, you’re most probably using Phoenix code in your mainboard firmware. Phoenix have been working with Lenovo and their ODMs on LVFS support for a while, fixing all the niggles that was stopping the capsule from working with the loader used by Linux. Phoenix can help customers build deliverables for the LVFS that use UX capsule support to make flashing beautiful, although it’s up to the OEM if that’s used or not.

It might seem slightly odd for me to be working with the firmware suppliers, rather than just OEMs, but I’m actually just doing both in parallel. From my point of view, both of the biggest firmware suppliers now understand the LVFS, and provide standards-compliant capsules by default. This should hopefully mean smaller Linux-specific OEMs like Tuxedo and Star Labs might be able to get signed UEFI capsules, rather than just getting a ROM file and an unsigned loader.

We’re still waiting for the last remaining huge OEM, but fingers crossed that should be any day now.

Firmware Attestation

When fwupd writes firmware to devices, it often writes it, then does a verify pass. This is to read back the firmware to check that it was written correctly. For some devices we can do one better, and read the firmware hash and compare it against a previously cached value, or match it against the version published by the LVFS. This means we can detect some unintentional corruption or malicious firmware running on devices, on the assumption that the bad firmware isn’t just faking the requested checksum. Still, better than nothing.

Any processor better than the most basic PIC or Arduino (e.g. even a tiny $5 ARM core) is capable of doing public/private key firmware signing. This would use standard crypto using X.509 keys or GPG to ensure the device only runs signed firmware. This protects against both accidental bitflips and also naughty behaviour, and is unofficial industry recommended practice for firmware updates. Older generations of the Logitech Unifying hardware were unsigned, and this made the MouseJack hack almost trivial to deploy on an unmodified dongle. Newer Unifying hardware requires a firmware image signed by Logitech, which makes deploying unofficial or modified firmware almost impossible.

There is a snag with UEFI capsule updates, which is how you probably applied your last “BIOS” firmware update. Although the firmware capsule is signed by the OEM or ODM, we can’t reliably read the SPI EEPROM from userspace. It’s fair to say flashrom does work on some older hardware but it also likes disabling keyboard controllers and making the machine reboot when probing hardware. We can get a hash of the firmware, or rather, a hash derived from the firmware image with other firmware-related things added for good measure. This is helpfully stored in the TPM chip, which most modern laptops have installed.

Although the SecureBoot process cares about the higher PCR values to check all manners of userspace, we only care about the index zero of this register, so called PCR0. If you change your firmware, for any reason, the PCR0 will change. There is one PCR0 checksum (or a number slightly higher than one, for reasons) on all hardware of a given SKU. If you somehow turn the requirement for the hardware signing key off on your machine (e.g. a newly found security issue), or your firmware is flashed using another method than UpdateCapsule (e.g. DediProg) then you can basically flash anything. This would be unlikely, but really bad.

If we include the PCR0 in the vendor-supplied firmware.metainfo.xml file, or set it in the admin console of the LVFS then we can verify that the firmware we’re running right now is the firmware the ODM or OEM uploaded. This means you can have firmware 100% verified, where you’re sure that the firmware version that was uploaded by the vendor is running on your machine right now. This is good.

As an incentive for vendors to support signing there are now “green ticks” and “red crosses” icons for each firmware on the LVFS. All green ticks mean the firmware was uploaded to the LVFS by the OEM or authorized ODM on behalf of the OEM, the firmware is signed using strong encryption, and we can do secure attestation for verification.

Obviously some protocols can’t do everything properly (e.g. ColorHug, even symmetric crypto isn’t good) but that’s okay. It’s still more secure than flashing a random binary from an FTP site, which is what most people were doing before. Not upstream yet, and not quite finished, so comments welcome.

AMI joins the LVFS

American Megatrends Inc. may not be a company you’ve heard of, unless perhaps you like reading early-boot BIOS messages. AMI is the world’s largest BIOS firmware vendor, supplying firmware and tools to customers such as Asus, Clevo, Intel, AMD and many others. If you’ve heard of a vendor using Aptio for firmware updates, that means it’s from them. AMI has been testing the LVFS, UpdateCapsule and fwupd for a few months and is now fully compatible. They are updating their whitepapers for customers explaining the process of generating a capsule, using the ESRT, and generating deliverables for the LVFS.

This means “LVFS Support” becomes a first class citizen alongside Windows Update for the motherboard manufacturers. This should trickle down to the resellers, so vendors using Clevo motherboards like Tuxedo get LVFS support almost for free. This will take a bit of time to trickle down to the smaller OEMs.

Also, expect another large vendor announcement soon. It’s the one quite a few people have been waiting for.

Adding an optional install duration to LVFS firmware

We’ve just added an optional feature to fwupd and the LVFS that some people might find useful: The firmware update process can now tell the user how long in seconds the update is going to take.

This means that users can know that a dock update might take 5 minutes, and so they start the update process before they go to lunch. A UEFI update will require multiple reboots and will take 45 minutes to apply, and so the user will only apply the update at the end of the day rather than losing access to the their computer for nearly an hour.

If you want to use this feature there are currently three ways to assign the duration to the update:

  • Changing the value on the LVFS admin console — the component update panel now has an extra input field to enter the
    duration in
    seconds
  • Adding a new attribute to the element, for instance:
    <release version="3.0.2" date="2018-11-09" install_duration="120">
    
  • Adding a ‘quirk’ to fwupd, for instance:
    [DeviceInstanceId=USB\VID_1234&PID_5678]
    InstallDuration = 40
    
  • For updates requiring a reboot the install duration should include the time to POST the system both before and after the update has run, but it can be approximate. Only users running very new versions of fwupd and gnome-software will be shown the install duration, and older versions will be unchanged as the new property will just be ignored. It’s therefore safe to include in all versions of firmware without adding a the dependency on a specific fwupd version.

More fun with libxmlb

A few days ago I cut the 0.1.4 release of libxmlb, which is significant because it includes the last three features I needed in gnome-software to achieve the same search results as appstream-glib.

The first is something most users of database libraries will be familiar with: Bound variables. The idea is you prepare a query which is parsed into opcodes, and then at a later time you assign one of the ? opcode values to an actual integer or string. This is much faster as you do not have to re-parse the predicate, and also means you avoid failing in incomprehensible ways if the user searches for nonsense like ]@attr. Borrowing from SQL, the syntax should be familiar:

g_autoptr(XbQuery) query = xb_query_new (silo, "components/component/id[text()=?]/..", &error);
xb_query_bind_str (query, 0, "gimp.desktop", &error);

The second feature makes the caller jump through some hoops, but hoops that make things faster: Indexed queries. As it might be apparent to some, libxmlb stores all the text in a big deduplicated string table after the tree structure is defined. That means if you do <component component="component">component</component> then we only store just one string! When we actually set up an object to check a specific node for a predicate (for instance, text()='fubar' we actually do strcmp("fubar", "component") internally, which in most cases is very fast…

Unless you do it 10 million times…

Using indexed strings tells the XbMachine processing the predicate to first check if fubar exists in the string table, and if it doesn’t, the predicate can’t possibly match and is skipped. If it does exist, we know the integer position in the string table, and so when we compare the strings we can just check two uint32_t’s which is quite a lot faster, especially on ARM for some reason. In the case of fwupd, it is searching for a specific GUID when returning hardware results. Using an indexed query takes the per-device query time from 3.17ms to about 0.33ms – which if you have a large number of connected updatable devices makes a big difference to the user experience. As using the indexed queries can have a negative impact and requires extra code it is probably only useful in a handful of cases. In case you do need this feature, this is the code you would use:

xb_silo_query_build_index (silo, "component/id", NULL, &error); // the cdata
xb_silo_query_build_index (silo, "component", "type", &error); // the @type attr
g_autoptr(XbNode) n = xb_silo_query_first (silo, "component/id[text()=$'test.firmware']", &error);

The indexing being denoted by $'' rather than the normal pair of single quotes. If there is something more standard to denote this kind of thing, please let me know and I’ll switch to that instead.

The third feature is: Stemming; which means you can search for “gaming mouse” and still get results that mention games, game and Gaming. This is also how you can search for words like Kongreßstraße which matches kongressstrasse. In an ideal world stemming would be computationally free, but if we are comparing millions of records each call to libstemmer sure adds up. Adding the stem() XPath operator took a few minutes, but making it usable took up a whole weekend.

The query we wanted to run would be of the form id[text()~=stem('?') but the stem() would be called millions of times on the very same string for each comparison. To fix this, and to make other XPath operators faster I implemented an opcode rewriting optimisation pass to the XbMachine parser. This means if you call lower-case(text())==lower-case('GIMP.DESKTOP') we only call the UTF-8 strlower function N+1 times, rather than 2N times. For lower-case() the performance increase is slight, but for stem it actually makes the feature usable in gnome-software. The opcode rewriting optimisation pass is kinda dumb in how it works (“lets try all combinations!”), but works with all of the registered methods, and makes all existing queries faster for almost free.

One common question I’ve had is if libxmlb is supposed to obsolete appstream-glib, and the answer is “it depends”. If you’re creating or building AppStream metadata, or performing any AppStream-specific validation then stick to the appstream-glib or appstream-builder libraries. If you just want to read AppStream metadata you can use either, but if you can stomach a binary blob of rewritten metadata stored somewhere, libxmlb is going to be a couple of orders of magnitude faster and use a ton less memory.

If you’re thinking of using libxmlb in your project send me an email and I’m happy to add more documentation where required. At the moment libxmlb does everything I need for fwupd and gnome-software and so apart from bugfixes I think it’s basically “done”, which should make my manager somewhat happier. Comments welcome.

Using the LVFS to influence procurement decisions

The National Cyber Security Centre (part of GCHQ, the UK version of the NSA) wrote a nice article on using the LVFS to influence procurement decisions. It’s probably also worth noting that the two biggest OEMs making consumer hardware also require all their ODMs to also support firmware updates on the LVFS. More and more mega-corporations also have “supports the LVFS” as a requirement for procurement.

The LVFS is slowly and carefully moving to the Linux Foundation, so expect more outreach and announcements soon.

libxmlb now a dependency of fwupd and gnome-software

I’ve just released libxmlb 0.1.3, and merged the branches for fwupd and gnome-software so that it becomes a hard dependency on both projects. A few people have reviewed the libxmlb code, and Mario, Kalev and Robert reviewed the fwupd and gnome-software changes so I’m pretty confident I’ve not broken anything too important — but more testing very welcome. GNOME Software RSS usage is about 50% of what is shipped in 3.30.x and fwupd is down by 65%! If you want to ship the upcoming fwupd 1.2.0 or gnome-software 3.31.2 in your distro you’ll need to have libxmlb packaged, or be happy using a meson subpackage to download libxmlb during the build of each dependent project.

Announcing the first release of libxmlb

Today I did the first 0.1.0 preview release of libxmlb. We’re at the “probably API stable, but no promises” stage. This is the library I introduced a couple of weeks ago, and since then I’ve been porting both fwupd and gnome-software to use it. The former is almost complete, and nearly ready to merge, but the latter is still work in progress with a fair bit of code to write. I did manage to launch gnome-software with libxmlb yesterday, and modulo a bit of brokenness it’s both faster to start (over 800ms faster from cold boot!) and uses an amazing 90Mb less RSS at runtime. I’m planning to merge the libxmlb branch into the unstable branch of fwupd in the next few weeks, so I need volunteers to package up the new hard dep for Debian, Ubuntu and Arch.

The tarball is in the usual place – it’s a simple Meson-built library that doesn’t do anything crazy. I’ve imported and built it already for Fedora, much thanks to Kalev for the super speedy package review.

I guess I should explain how applications are expected to use this library. At its core, there are just five different kinds of objects you need to care about:

  • XbSilo – a deduplicated string pool and a read only node tree. This is typically kept alive for the application lifetime.
  • XbNode – a “Gobject wrapped” immutable node available as a query result from XbSilo.
  • XbBuilder – a “compiler” to build the XbSilo from XbBuilderNode’s or XbBuilderSource’s. This is typically created and destroyed at startup or when the blob needs regenerating.
  • XbBuilderNode – a mutable node that can have a parent, children, attributes and a value
  • XbBuilderSource – a source of data for XbBuilder, e.g. a .xml.gz file or just a raw XML string

The way most applications will use libxmlb is to create a local XbBuilder instance, add some XbBuilderSource’s and then “ensure” a local cache file. The “ensure” process either mmap loads the binary blob if all the file mtimes are the same, or compiles a blob saving it to a new file. You can also tell the XbSilo to watch all the sources that it was built with, so that if any files change at runtime the valid property gets set to FALSE and the application can xb_builder_ensure() at a convenient time.

Once the XbBuilder has been compiled, a XbSilo pops out. With the XbSilo you can query using most common XPath statements – I actually ended up implementing a FORTH-style stack interpreter so we can now do queries like /components/component/id[contains(upper-case(text()),'GIMP')] – I’ll say a bit more on that in a minute. Queries can limit the number of results (for speed), and are deduplicated in a sane way so it’s really quite a simple process to achieve something that would be a lot of C code. It’s possible to directly query an attribute or text value from a node, so the silo doesn’t have to be passed around either.

In the process of porting gnome-software, I had to make libxmlb thread-safe – which required some internal organisation. We now have an non-exported XbMachine stack interpreter, and then the XbSilo actually registers the XML-specific methods (like contains()) and functions (like ~=). These get passed some per-method user data, and also some per-query private data that is shared with the node tree – allowing things like [last()] and position()=3 to work. The function callbacks just get passed an query-specific stack, which means you can allow things like comparing “1” to 1.00f This makes it easy to support more of XPath in the future, or to support something completely application specific like gnome-software-search() without editing the library.

If anyone wants to do any API or code review I’d be super happy to answer any questions. Coverity and valgrind seem happy enough with all the self tests, but that’s no replacement for a human asking tricky questions. Thanks!

Speeding up AppStream: mmap’ing XML using libxmlb

AppStream and the related AppData are XML formats that have been adopted by thousands of upstream projects and are being used in about a dozen different client programs. The AppStream metadata shipped in Fedora is currently a huge 13Mb XML file, which with gzip compresses down to a more reasonable 3.6Mb. AppStream is awesome; it provides translations of lots of useful data into basically all languages and includes screenshots for almost everything. GNOME Software is built around AppStream, and we even use a slightly extended version of the same XML format to ship firmware update metadata from the LVFS to fwupd.

XML does have two giant weaknesses. The first is that you have to decompress and then parse the files – which might include all the ~300 tiny AppData files as well as the distro-provided AppStream files, if you want to list installed applications not provided by the distro. Seeking lots of small files isn’t so slow on a SSD, and loading+decompressing a small file is actually quicker than loading an uncompressed larger file. Parsing an XML file typically means you set up some callbacks, which then get called for every start tag, text section, then end tag – so for a 13Mb XML document that’s nested very deeply you have to do a lot of callbacks. This means you have to process the description of GIMP in every language before you can even see if Shotwell exists at all.

The typical way parsing XML involves creating a “node tree” when parsing the XML. This allows you treat the XML document as a Document Object Model (DOM) which allows you to navigate the tree and parse the contents in an object oriented way. This means you typically allocate on the heap the nodes themselves, plus copies of all the string data. AsNode in libappstream-glib has a few tricks to reduce RSS usage after parsing, which includes:

  • Interning common element names like description, p, ul, li
  • Freeing all the nodes, but retaining all the node data
  • Ignoring node data for languages you don’t understand
  • Reference counting the strings from the nodes into the various appstream-glib GObjects

This still has a both drawbacks; we need to store in hot memory all the screenshot URLs of all the apps you’re never going to search for, and we also need to parse all these long translated descriptions data just to find out if gimp.desktop is actually installable. Deduplicating strings at runtime takes nontrivial amounts of CPU and means we build a huge hash table that uses nearly as much RSS as we save by deduplicating.

On a modern system, parsing ~300 files takes less than a second, and the total RSS is only a few tens of Mb – which is fine, right? Except on resource constrained machines it takes 20+ seconds to start, and 40Mb is nearly 10% of the total memory available on the system. We have exactly the same problem with fwupd, where we get one giant file from the LVFS, all of which gets stored in RSS even though you never have the hardware that it matches against. Slow starting of fwupd and gnome-software is one of the reasons they stay resident, and don’t shutdown on idle and restart when required.

We can do better.

We do need to keep the source format, but that doesn’t mean we can’t create a managed cache to do some clever things. Traditionally I’ve been quite vocal against squashing structured XML data into databases like sqlite and Xapian as it’s like pushing a square peg into a round hole, and forces you to think like a database doing 10 level nested joins to query some simple thing. What we want to use is something like XPath, where you can query data using the XML structure itself.

We also want to be able to navigate the XML document as if it was a DOM, i.e. be able to jump from one node to it’s sibling without parsing all the great, great, great, grandchild nodes to get there. This means storing the offset to the sibling in a binary file.

If we’re creating a cache, we might as well do the string deduplication at creation time once, rather than every time we load the data. This has the added benefit in that we’re converting the string data from variable length strings that you compare using strcmp() to quarks that you can compare just by checking two integers. This is much faster, as any SAT solver will tell you. If we’re storing a string table, we can also store the NUL byte. This seems wasteful at first, but has one huge advantage – you can mmap() the string table. In fact, you can mmap the entire cache. If you order the string table in a sensible way then you store all the related data in one block (e.g. the <id> values) so that you don’t jump all over the cache invalidating almost everything just for a common query. mmap’ing the strings means you can avoid strdup()ing every string just in case; in the case of memory pressure the kernel automatically reclaims the memory, and the next time automatically loads it from disk as required. It’s almost magic.

I’ve spent the last few days prototyping a library, which is called libxmlb until someone comes up with a better name. I’ve got a test branch of fwupd that I’ve ported from libappstream-glib and I’m happy to say that RSS has reduced from 3Mb (peak 3.61Mb) to 1Mb (peak 1.07Mb) and the startup time has gone from 280ms to 250ms. Unless I’ve missed something drastic I’m going to port gnome-software too, and will expect even bigger savings as the amount of XML is two orders of magnitude larger.

So, how do I use this thing. First, lets create a baseline doing things the old way:

$ time appstream-util search gimp.desktop
real	0m0.645s
user	0m0.800s
sys	0m0.184s

To create a binary cache:

$ time xb-tool compile appstream.xmlb /usr/share/app-info/xmls/* /usr/share/appdata/* /usr/share/metainfo/*
real	0m0.497s
user	0m0.453s
sys	0m0.028s

$ time xb-tool compile appstream.xmlb /usr/share/app-info/xmls/* /usr/share/appdata/* /usr/share/metainfo/*
real	0m0.016s
user	0m0.004s
sys	0m0.006s

Notice the second time it compiled nearly instantly, as none of the filename or modification timestamps of the sources changed. This is exactly what programs would do every time they are launched.

$ df -h appstream.xmlb
4.2M	appstream.xmlb

$ time xb-tool query appstream.xmlb "components/component[@type='desktop']/id[text()='firefox.desktop']"
RESULT: <id>firefox.desktop</id>
RESULT: <id>firefox.desktop</id>
RESULT: <id>firefox.desktop</id>
real	0m0.008s
user	0m0.007s
sys	0m0.002s

8ms includes the time to load the file, search for all the components that match the query and the time to export the XML. You get three results as there’s one AppData file, one entry in the distro AppStream, and an extra one shipped by Fedora to make Firefox featured in gnome-software. You can see the whole XML component of each result by appending /.. to the query. Unlike appstream-glib, libxmlb doesn’t try to merge components – which makes it much less magic, and a whole lot simpler.

Some questions answered:

  • Why not just use a GVariant blob?: I did initially, and the cache was huge. The deeply nested structure was packed inefficiently as you have to assume everything is a hash table of a{sv}. It was also slow to load; not much faster than just parsing the XML. It also wasn’t possible to implement the zero-copy XPath queries this way.
  • Is this API and ABI stable?: Not yet, as soon as gnome-software is ported.
  • You implemented XPath in c‽: No, only a tiny subset. See the README.md

Comments welcome.