Musings on the Microsoft Component Firmware Update (CFU) Protocol

CFU is a new specification from Microsoft designed to be the one true protocol for updating hardware. No vendor seems to be shipping hardware supporting CFU (yet?), although I’ve had two peripheral vendors ask my opinion which is why I’m posting here.

CFU has a bazaar pre-download phase before sending the firmware to the microcontroller so the uC can check if the firmware is required and compatible. CFU also requires devices to be able to transfer the entire new transfer mode in runtime mode. The pre-download “offer” allows the uC to check any sub-components attached (e.g. other devices attached to the SoC) and forces it to do dep resolution in case sub-components have to be updated in a specific order.

Pushing the dep resolution down to the uC means the uC has to do all the version comparisons and also know all the logic with regard to protocol incompatibilities. You could be in a position where the uC firmware needs to be updated so that it “knows” about the new protocol restrictions, which are needed to update the uC and the things attached in the right order in a subsequent update. If we always update the uC to the latest, the probably-factory-default running version doesn’t know about the new restrictions.

The other issue with this is that the peripheral is unaware of the other devices in the system, so for instance couldn’t only install a new firmware version for only new builds of Windows for example. Something that we support in fwupd is being able to restrict the peripheral device firmware to a specific SMBIOS CHID or a system firmware vendor, which lets vendors solve the “same hardware in different chassis, with custom firmware” problem. I don’t see how that could be possible using CFU unless I misunderstand the new .inf features. All the dependency resolution should be in the metadata layer (e.g. in the .inf file) rather than being pushed down to the hardware running the old firmware.

What is possibly the biggest failure I see is the doubling of flash storage required to do an runtime transfer, the extra power budget of being woken up to process the “offer” and enough bulk power to stay alive if “unplugged” during a A/B swap. Realistically it’s an extra few dollars for a ARM uC to act as a CFU “bridge” for legacy silicon and IP, which I can’t see as appealing to an ODM given they make other strange choices just to save a few cents on a BOM. I suppose the CFU “bridge” could also do firmware signing/encryption but then you still have a physical trace on the PCB with easy-to-read/write unsigned firmware. CFU could have defined a standardized way to encrypt and sign firmware, but they kinda handwave it away letting the vendors do what they think is best, and we all know how that plays out.

CFU downloads in the runtime mode, but from experience, most of the devices can transfer a few hundred Kb in less than ~200ms. Erasing flash is typically the slowest thing, typically less than 2s, writing next at ~1s both done in the bootloader phase. I’ve not seen a single device that can do a flash-addr-swap to be able to do the A/B solution they’ve optimized for, with the exception of enterprise UEFI firmware which CFU can’t update anyway.

By far the longest process in the whole update step is the USB re-enumeration (up to twice) which we have to allow 5s (!!!) for in fwupd due to slow hubs and other buggy hardware. So, CFU doubles the flash size requirement for millions of device to save ~5 seconds for a procedure which might be done once or twice in the devices lifetime. It’s also not the transfer that’s the limitation even over bluetooth as if the dep resolution is “higher up” you only need to send the firmware to the device when it needs an update, rather that every time you scan the device.

I’m similarly unimpressed with the no-user-interaction idea where firmware updates just happen in the background, as the user really needs to know when the device is going to disappear and re-appear for 5 seconds (even CFU has to re-enumerate…) — image it happening during a presentation or as the machine is about to have the lid shut to go into S3.

EDIT: See the new fwupd plugin.

GNOME Software in Fedora will no longer support snapd

In my slightly infamous email to fedora-devel I stated that I would turn off the snapd support in the gnome-software package for Fedora 31. A lot of people agreed with the technical reasons, but failed to understand the bigger picture and asked me to explain myself.

I wanted to tell a little, fictional, story:

In 2012 the ISO institute started working on a cross-vendor petrol reference vehicle to reduce the amount of R&D different companies had to do to build and sell a modern, and safe, saloon car.

Almost immediately, Mercedes joins ISO, and starts selling the ISO car. Fiat joins in 2013, Peugeot in 2014 and General Motors finally joins in 2015 and adds support for Diesel engines. BMW, who had been trying to maintain the previous chassis they designed on their own (sold as “BMW Kar Koncept”), finally adopts the ISO car also in 2015. BMW versions of the ISO car use BMW-specific transmission oil as it doesn’t trust oil from the ISO consortium.

Mercedes looks to the future, and adds high-voltage battery support to the ISO reference car also in 2015, adding the required additional wiring and regenerative braking support. All the other members of the consortium can use their own high voltage batteries, or use the reference battery. The battery can be charged with electricity from any provider.

In 2016 BMW stops marketing the “ISO Car” like all the other vendors, and instead starts calling it “BMW Car” instead. At about the same time BMW adds support for hydrogen engines to the reference vehicle. All the other vendors can ship the ISO car with a Hydrogen engine, but all the hydrogen must be purchased from a BMW-certified dealer. If any vendor other than BMW uses the hydrogen engines, they can’t use the BMW-specific heat shield which protects the fuel tank from exploding in the event on a collision.

In 2017 Mercedes adds traction control and power steering to the ISO reference car. It is enabled almost immediately and used by nearly all the vendors with no royalties and many customer lives are saved.

In 2018 BMW decides that actually producing vendor-specific oil for it’s cars is quite a lot of extra work, and tells all customers existing transmission oil has to be thrown away, but now all customers can get free oil from the ISO consortium. The ISO consortium distributes a lot more oil, but also has to deal with a lot more customer queries about transmission failures.

In 2019 BMW builds a special cut-down ISO car, but physically removes all the petrol and electric functionality from the frame. It is rebranded as “Kar by BMW”. It then sends a private note to the chair of the ISO consortium that it’s not going to be using ISO car in 2020, and that it’s designing a completely new “Kar” that only supports hydrogen engines and does not have traction control or seatbelts. The explanation given was that BMW wanted a vehicle that was tailored specifically for hydrogen engines. Any BMW customers using petrol or electricity in their car must switch to hydrogen by 2020.

The BMW engineers that used to work on ISO Car have been shifted to work on Kar, although have committed to also work on Car if it’s not too much extra work. BMW still want to be officially part of the consortium and to be able to sell the ISO Car as an extra vehicle to the customer that provides all the engine types (as some customers don’t like hydrogen engines), but doesn’t want to be seen to support anything other than a hydrogen-based future. It’s also unclear whether the extra vehicle sold to customers would be the “ISO Car” or the “BMW Car”.

One ISO consortium member asks whether they should remove hydrogen engine support from the ISO car as they feel BMW is not playing fair. Another consortium member thinks that the extra functionality could just be disabled by default and any unused functionality should certainly be removed. All members of the consortium feel like BMW has pushed them too far. Mercedes stop selling the hydrogen ISO Car model stating it’s not safe without the heat shield, and because BMW isn’t going to be supporting the ISO Car in 2020.

Fun with the ODRS, part 2

For the last few days I’ve been working on the ODRS, the review server used by GNOME Software and other open source software centers. I had to do a lot of work initially to get the codebase up to modern standards, but now it has unit tests (86% coverage!), full CI and is using the latest versions of everything. All this refactoring allowed me to add some extra new features we’ve needed for a while.

The first feature changes how we do moderation. The way the ODRS works means that any unauthenticated user can mark a review for moderation for any reason in just one click. This means that it’s no longer shown to any other user and requires a moderator to perform one of three actions:

  • Decide it’s okay, and clear the reported counter back to zero
  • Decide it’s not very good, and either modify it or delete it
  • Decide it’s spam or in any way hateful, and delete all the reviews from the submitter, adding them to the user blocklist

For the last few years it’s been mostly me deciding on the ~3k marked-for-moderatation reviews with the help of Google Translate. Let me tell you, after all that my threshold for dealing with internet trolls is super low. There are already over 60 blocked users on the ODRS, although they’ll never really know they are shouting into /dev/null

One change I’ve made here is that it now takes two “reports” of a review before it needs moderation; the logic being that a lot of reports seem accidental and a really bad review is already normally reported by multiple people in the few days after it’s been posted. The other change is that we now have a locale-specific “bad word list” that submitted reports are checked against at submission time. If they are flagged, the moderator has to decide on the action before it’s ever shown to other users. This has already correctly flagged 5 reviews in the couple of days since it was deployed. If you contributed to the spreadsheet with “bad words” for your country I’m very grateful. That bad word list will be available as a JSON dump on the ODRS on Monday in case it’s useful to other people. I fully expect it’ll grow and change over time.

The other big change is dealing with different application IDs. Over the last decade some applications have moved from “launchable-style” inkscape.desktop IDs to AppStream-style IDs like org.inkscape.Inkscape.desktop and are even reported in different forms, e.g. the Flathub-inspired org.inkscape.Inkscape and the Snappy io.snapcraft.inkscape-tIrcA87dMWthuDORCCRU0VpidK5SBVOc. Until today a review submitted against the old desktop ID wouldn’t match for the Flatpak one, and now it does. The same happens when we get the star ratings which means that apps that change ID don’t start with a clean slate and inherit all the positivity of the old version. Of course, the usual per-request ordering and filtering is done, so older versions than the one requested might be shown lower than newer versions anyway.

This is also your monthly reminder to use <provides><id>oldname.desktop</id></provides> in your metainfo.xml file if you change your desktop ID. That includes you Flathub and Snapcraft maintainers too. If you do that client side then you at least probably get the right reviews if the software center does the right thing, but doing it server side as well makes really sure you’re getting the reviews and ratings you want in all cases.

If all this sounds interesting, and you’d like to know more about the ODRS development, or would like to be a moderator for your language, please join the mailing list and I’ll post there next week when I’ve made the moderator experience nicer than it is now. It’ll also be the place to request help, guidance and also ask for new features.

Initial Fun with the Open Desktop Ratings Service: Swearing!

The ODRS is the service that produces ratings and reviews for gnome-software. I built the service a few years ago, and it’s been dutifully trucking on ever since. There are over 25,000 reviews, 50k votes, and over 4k different applications reviewed. Over half a million clients get application reviews every single day.

Recently it’s been showing signs of needing work, and so I’ve spent a few days converting it to Python 3, then to SQLAlchemy, and then fixing all the broken stuff that we’ve lived with for a while (e.g. no emoji support because we were not using utf8mb4…). Part of the new work will be making it easier to flag and then moderate reviews, and that needs your help. Although any unauthenticated user can report a review for any reason, some reviews should be automatically marked at submission if they contain known bad words. There is almost no reason to write a review in locale en_GB and use the word fuck and so I think marking that review as needing moderation before it’s shown to thousands of people is a sensible thing to do.

To this to work, I can’t just use a blacklist of words as some words are only really vulgar in some regions, and some are perfectly valid words in other languages. For this reason I need the blacklist to be keyed to the submitted locale.

This is where I need your help. If you can spare 2 minutes, and know a lot of dirty words in your language can you please add them to this spreadsheet. Much appreciated.

WOGUE is no friend of GNOME

Alex Diavatis is the person behind the WOGUE account on YouTube. For a while he’s been posting videos about GNOME. I think the latest idea is that he’s trying to “shame” developers into working harder. From the person who’s again on the other end of his rants it’s having the opposite effect.

We’re all doing our best, and I’m personally balancing about a dozen different plates trying to keep them all spinning. If any of the plates fall on the floor, perhaps helping with triaging bugs, fixing little niggles or just saying something positive might be a good idea. In fact, saying nothing would be better than the sarcasm and making silly videos.

Breaking apart Dell UEFI Firmware CapsuleUpdate packages

When firmware is uploaded to the LVFS we perform online checks on it. For example, one of the tests is looking for known badness like embedded UTF-8/UTF-16 BEGIN RSA PRIVATE KEY strings. As part of this we use CHIPSEC (in the form of chipsec_util -n uefi decode) which searches the binary for a UEFI volume header which is a simple string of _FVH and then decompresses the volumes which we then read back as component shards. This works well on plain EDK2 firmware, and the packages uploaded by Lenovo and HP which use IBVs of AMI and Phoenix. The nice side effect is that we can show the user what binaries have changed, as the vendor might have accidentally forgotten to mention something in the release notes.

The elephants in the room were all the hundreds of archives from Dell which could not be loaded by chipsec with no volume header detected. I spent a few hours last night adding support for these archives, and the secret is here:

  1. Decompress the firmware.cab archive into firmware.bin, disregarding the signing and metadata.
  2. If CHIPSEC fails to analyse firmware.bin, look for a > 512kB decompress-able Zlib section somewhere after the capsule header, actually in the PE binary.
  3. The decompressed blob is in PFS format, which seems to be some Dell-specific format that’s already been reverse engineered.
  4. The PFS blob is not further compressed and is in one continuous block, and so the entire PFS volume can be passed to chipsec for analysis.

The Zlib start offset seems to jump around for each release, and I’ve not found any information in the original PE file that indicates the offset. If anyone wants to give me a hint to avoid searching the multimegabyte blob for two bytes (and then testing if it’s just chance, or indeed an Zlib stream…) I would be very happy, even if you have to remain anonymous.

So, to sum up:

CapsuleHeader
  PE Binary
    Zlib stream
      PFS
        FVH
          PE DXEs
          PE PEIMs
          …

I’ll see if chipsec upstream wants a patch to do this as it’s probably useful outside of the LVFS too.

Donating 5 minutes of your time to help the LVFS

For about every 250 bug reports I recieve I get an email offering to help. Most of the time the person offering help isn’t capable of diving right in the trickiest parts of the code and just wanted to make my life easier. Now I have a task that almost anyone can help with…

For the next version of the LVFS we deploy we’re going to be showing what was changed between each firmware version. Rather than just stating the firmware has changed from SHA1:DEAD to SHA1:BEEF and some high level update description provided by the vendor, we can show the interested user the UEFI modules that changed. I’m still working on the feature and without more data it’s kinda, well, dull. Before I can make the feature actually useful to anyone except a BIOS engineer, I need some help finding out information about the various modules.

In most cases it’s simply googling the name of the module and writing 1-2 lines of a summary about the module. In some cases the module isn’t documented at all, and that’s fine — I can go back to the vendors and ask them for more details about the few we can’t do ourselves. What I can’t do is ask them to explain all 150 modules in a specific firmware release, and I don’t scale to ~2000 Google queries. With the help of EDK2 I’ve already done 213 myself but now I’ve run out of puff.

So, if you have a spare 5 minutes I’d really appreciate some help. The shared spreadsheet is here, and any input is useful. Thanks!

Updating the firmware on new Dell Docks

Yesterday Dell unveiled their new range of docking stations. I’ve had a WD19TB for a few months, and it seems to work fine; you can plug one thunderbolt cable into my XPS 13 and it turns it into a workstation, driving two screens, connecting me to wired Ethernet and connecting all my USB stuff. When I want to run away, I just unplug one thing and my workstation turns back into a portable laptop. The dock also randomly has a headphone out socket, although I like to drive my sound through a USB sound card and S/PDIF anyway.

The impressive bit for me is that Dell wanted firmware upgrading to work perfectly from day 1 – there has been testable firmware embargoed on the LVFS for many weeks. Although the dock is one physical thing, internally it’s made up of different components and subsystems, all with slightly different quirky flashing protocols. Behind the scenes and without great fan-fair, Dell added the “composite firmware” support to fwupd, so not only do the devices look logically correct when using fwupdmgr get-topology it means you can update all the different technology on the device with one .cab file.

This means, to update to the dock I don’t need to tell you lots of technical information about how to update to some super new version of something. Just click on the upgrade button in GNOME Software when the firmware update has been downloaded for you. This is exactly how firmware updates should work – you buy the hardware you need from the shop, plug it in, get the latest bug fixes and features automatically with no “googling” or using crazy commands on the command line.

I’ve also been working with another OEM on their next generation of docking stations. This work can of course piggy-back onto the composite feature when there is hardware ready for test. In my opinion they’re about 9 months behind Dell at this point, so I guess if you want a docking station that’s supported on the LVFS you know what to buy…

Remember the extra metadata if you change a desktop ID!

This is important if you’re the upstream maintainer of an application: If you change the desktop ID then it’s like breaking API. Changing a desktop ID should be done carefully and in a development branch only — and then you need to communicate it and give the distros a chance to adapt to the new name.

If you’re just changing the desktop ID and not forking development, you also need to add something like this in your metainfo.xml file:

  <provides>
    <id>old-name.desktop</id>
  </provides>

GNOME Software gets lots of bugs about showing “duplicate” search results, but there’s no reliable way it can know that calibre-gui.desktop is the same app as com.calibre_ebook.calibre without some help. If you’re a packager building an application for something like Flathub you only need to include the extra provides line if you’re adding a new metainfo.xml file rather than just using rename-appdata-file in the JSON file.