Making it easy to generate fwupd device emulation data

We’re trying to increase the fwupd coverage score, so we can mercilessly refactor and improve code upstream without risks of regressions. To do this we run thousands of unit tests for each part of the libfwupd public API and libfwupdplugin private API. This gets us a long way, but what we really want to do is emulate the end-to-end firmware update of every real device we support.

It’s not trivial (or quick) connecting hundreds of devices to a specific CI machine, and so for some time we’ve supported recording USB device enumeration, re-plug, firmware write, rere-plug and re-enumeration. For fwupd 2.0.0 we added support for all sysfs-based devices too, which allows us emulate a real world NVMe disk doing actual ioctls() and reads() in every submitted CI job. We’re now going to ask vendors to record emulations for existing plugins of the firmware update so we can run those in CI too.

The device emulation docs are complicated and there’s lots of things that the user can do wrong. What I really wanted was a “click, click, save-as, click” user experience that doesn’t need to use the command line. The tl;dr: is that we’ve now added the needed async API in fwupd 2.0.1 (probably going to be released on Monday) and added the click, click UI to gnome-firmware:

There’s a slight niggle when the user starts recording the first “internal” device (e.g. a NVMe disk) that we need to ask the user to restart the daemon or the computer. This is because we can’t just hotplug the internal non-removable device, and need to “start recording” then “enumerate device(s)” rather than the other way around. Recording all the device enumeration isn’t free in CPU or RAM (and is possibly a security problem too), and so we don’t turn it on by default. All the emulation is also all controlled using polkit now, so you need the root password to do anything remotely interesting.

Some of the strings are a bit unhelpful, and some a bit clunky, so if you see anything that doesn’t look awesome or is hard to translate please tell us and we can fix it up. Of course, even better would be a merge request with a better string.

If you want to try it out there’s a COPR with all the right bits for Fedora 41. It’ll might also work on Fedora 40 if you remove gnome-software. I’ll probably switch the Flathub build to 48.alpha when fwupd 2.0.1 is released too. Feedback welcome.

fwupd 2.0.0 and new tricks

Today I tagged fwupd 2.0.0, which includes lots of new hardware support, a ton of bugfixes and more importantly a redesigned device prober and firmware loader that allows it to do some cool tricks. As this is a bigger-than-usual release I’ve written some more verbose releases notes below.

The first notable thing is that we’ve removed the requirement of GUsb in the daemon, and now use libusb directly. This allowed us to move the device emulation support from libgusb up into libfwupdplugin, which now means we can emulate devices created from sysfs too. This means that we can emulate end-to-end firmware updates on fake hidraw and nvme devices in CI just like we’ve been able to emulate using fake USB devices for some time. This increases the coverage of testing for every pull request, and makes sure that none of our “improvements” actually end up breaking firmware updates on some existing device.

The emulation code is actually pretty cool; every USB control request, ioctl(), read() (and everything inbetween) is recorded from a target device and saved to a JSON file with a unique per-request key for each stage of the update process. This is saved to a zip archive and is usually uploaded to the LVFS mirror and used in the device-tests in fwupd. It’s much easier than having a desk full of hardware and because each emulation is just that, emulated, we don’t need to do the tens of thousands of 5ms sleeps in between device writes — which means most emulations take a few ms to load, decompress, write and verify. This means you can test [nearly] “every device we support” in just a few seconds of CI time.

Another nice change is the removal of GUdev as a dependency. GUdev is a nice GObject abstraction over libudev and then sd_device from systemd, but when you’re dealing with thousands of devices (that you’re poking in weird ways), and tens of thousands of device children and parents the “immutable device state” objects drift from reality and the abstraction layers really start to hurt. So instead of using GUdev we now listen to the netlink socket and parse those events into fwupd FuDevice objects, rather than having an abstract device with another abstract device being used as a data source. It has also allowed us to remove at least one layer of caching (that we had to work around in weird ways), and also reduce the memory requirement both at startup and at runtime at the expense of re-implementing the netlink parsing code. It also means we can easily start using ueventd, which makes it possible to run fwupd on Android. More on that another day!

dep graph showing lots of things
The old
dep graph showing a lot less things
The new

The biggest change, and the feature that’s been requested the most by enterprise customers is the ability to “stream” firmware from archives into devices. What fwupdmgr used to do (and what 1_9_X still does) is:

  • Send the cabinet archive to the daemon as a file descriptor
  • The daemon then loads the input stream into memory (copy 1)
  • The memory blob is parsed as a cabinet archive, and the blocks-with-header are re-assembled into whole files (copy 2)
  • The payload is then typically chunked into pieces, with each chunk being allocated as a new blob (copy 3)
  • Each chunk is sent to the device being updated

This worked fine for a 32MB firmware payload — we allocate ~100MB of memory and then free it, no bother at all.

Where this fails is for one of two cases: huge firmware or underpowered machine — or in the pathological case, huge video conferencing camera firmware with inexpensive Google ChromeBook. In that example we might have a 1.5GB firmware file (it’s probably a custom Android image…) on a 4GB-of-RAM budget ChromeBook. The running machine has a measly 1GB free system memory, and then fwupd immediately OOMs when just trying to parse the archive, let alone deploy the firmware.

So what can we do to reduce the number of in memory copies, or maybe even remove them all completely? There are two tricks that fwupd 2.0.x uses to load firmware now, and those two primitives we now use all over the source tree:

Partial Input Stream:

This models an input stream (which you can think of like a file descriptor) that is made up of a part of a different input stream at a specific offset. So if you have a base input stream of [123456789] you can build two partial input streams of, say, [234] and [789]. If you try and read() 5 bytes from the first partial stream you just get 3 bytes back. If you seek to offset 0x1 on the second partial input stream you get the two bytes of [89].

Composite Input Stream

This models a different kind of input stream, which is made up of one or more partial input streams. In some cases there can be hundreds of partial streams making up one composite stream. So if you take the first two partial input streams defined a few lines before, and then add them to a composite input stream you get [234789] — and reading 8 bytes at offset 0x0 from that would give you what you expect.

This means the new way of processing firmware archives can be:

  • Send the cabinet archive to the daemon as a file descriptor
  • The daemon parses it as a cab archive header, and adds the data section of each block to a partial stream that references the base stream at a specific offset
  • The daemon “collects” all the partial streams into a composite stream for each file in the archive that spans multiple blocks
  • The payload is split into chunks, with each chunk actually being a partial stream of the composite file stream
  • Each chunk is read from the stream, and sent to the device being updated

Sooo…. We never actually read the firmware payload from the cabinet file descriptor until we actually send the chunk of payload to the hardware. This means we have to seek() all over the place, possibly many times for each chunk, but in the kernel a seek() is really just doing some pointer maths to a memory buffer and so it’s super quick — even faster in real time than the “simple” process we used in 1_9_X. The only caveat is that you have to use uncompressed cabinet archives (the default for the LVFS) — as using MSZIP decompression currently does need a single copy fallback.

blocks in a cab archive

This means we can deploy a 1.5GB firmware payload using an amazingly low 8MB of RSS, and using less CPU that copying 1.5GB of data around a few times. Which means, you can now deploy that huge firmware to that $3,000 meeting room camera from a $200 ChromeBook — but also means we can do the same in RHEL for 5G mobile broadband radios on low-power, low-cost IoT hardware.

Making such huge changes to fwupd meant we could justify branching a new release, and because we bumped the major version it also made sense to remove all the deprecated API in libfwupd. All the changes are documented in the README file, but I’ve already sent patches for gnome-firmware, gnome-software and kde-discover to make the tiny changes needed for the library bump.

My plan for 2.0.x is to ship it in Flathub, and in Fedora 42 — but NOT Fedora 41, RHEL 9 or RHEL 10 just yet. There is a lot of new code that’s only had a little testing, and I fully expect to do a brown paperbag 2.0.1 release in a few days because we’ve managed to break some hardware for some vendor that I don’t own, or we don’t have emulations for. If you do see anything that’s weird, or have hardware that used to be detected, and now isn’t — please let us know.

Anyway, enough talking for now, enjoy!

fwupd and xz metadata

A few people (and multi-billion dollar companies!) have asked for my response to the xz backdoor. The fwupd metadata that millions of people download every day is a 9.5MB XML file — which thankfully is very compressible. This used to be compressed as gzip by the LVFS, making it a 1.6MB download for end-users, but in 2021 we switched to xz compression instead.

What actually happens behind the scenes is that the libxmlb library loads the optionally compressed metadata into a mmap-able binary blob, and then it gets used by fwupd to look for new updates for specific hardware. In libxmlb 0.3.3 we added support for xz as a compression format. Then fwupd 1.8.7 was released with xz support, preferring the xz format to the “legacy” gz format — as the metadata became a 1.1MB download, saving significant amounts of data from the CDN.

Then this week we learned that xz wasn’t the kind of thing we want to depend on. Out of an abundance of caution (and to be clear — my understanding is there is no fwupd or LVFS security problem of any kind) I’ve switched the LVFS to also generate zstd metadata, make libxmlb no longer hard depend on lzma and switched fwupd to prefer the zstd metadata over the xz metadata if the installed version of libjcat supports it. The zstd metadata is also ~3% smaller than xz (and faster to decompress), but the real benefit is that I now trust it a lot more than xz.

I’ll be doing new libxmlb and fwupd releases with the needed changes next week.

fwupd: Auto-Quitting On Idle, Harder

In fwupd 1.9.12 and earlier we had the following auto-quit behavior: Auto-quit on idle after 2 hours, unless:

  • Any thunderbolt controller, thunderbolt retimer or synaptics-mst devices exist.

These devices are both super slow to query and also use battery power to query as you have to power on various hungry things and then power them down to query for the current firmware version.

In 19.13, due to be released in a few days time, we now: Auto-quit after 5 minutes, unless:

  • Any thunderbolt controller, thunderbolt retimer or synaptics-mst devices exist.
  • Any D-Bus client (that used or is using fwupd) is still alive, which includes gnome-software if it’s running in the background of the GNOME session
  • The daemon took more than 500ms to start – on the logic it’s okay to wait 0.5 seconds on the CLI to get results to a query, but we don’t want to be waiting tens of seconds to check for updates on a deeply nested USB hub devices.

The tl;dr: is that most laptop and desktop machines have Thunderbolt or MST devices, and so they already had fwupd running all the time before, and continue to have it running all the time now. Trading 3.3MB of memory and an extra process for instant queries on a machine with GBs of memory is probably worthwhile. For embedded machines like IoT devices, and for containers (that are using fwupd to update things like the dbx) fwupd was probably starting and then quitting after 2h before, and now fwupd is only going to be alive for 5 minutes before quitting.

If any of the thresholds (500 ms) or timeouts (5 mins) are offensive to you then it’s all configurable, see man fwupd.conf for details. Comments welcome.

Looking for LogoFAIL on your local system

A couple of months ago, Binarly announced LogoFAIL which is a pretty serious firmware security problem. There is lots of complexity Alex explains much better than I might, but essentially a huge amount of system firmware running right now is vulnerable: The horribly-insecure parsing in the firmware allows the user to use a corrupted OEM logo (the one normally shown as the system boots) to run whatever code they want, providing a really useful primitive to do basically anything the attacker wants when running in a super-privileged boot state.

Vendors have to release new firmware versions to address this, and OEMs using the LVFS have pumped out millions of updates over the last few weeks.

So, what can we do to check that your system firmware has been patched [correctly] by the OEM? The only real way we can detect this is by dumping the BIOS in userspace, decompressing the various sections and looking at the EFI binary responsible for loading the image. In an ideal world we’d be able to look at the embedded SBoM entry for the specific DXE, but that’s not a universe we live in yet — although it is something I’m pushing the IBVs really hard to do. What we can do right now is token matching (or control flow analysis) to detect the broken and fixed image loader versions.

The four decompressing the various sections words hide how complicated taking an Intel Flash Descriptor image and breaking it into EFI binaries actually is. There are many levels of Matryoshka doll stacking involving hideous custom LZ77 and Huffman decompressors, and of course vendor-specific section types. It’s been several programmer-months spread over the last few years figuring it all out. Programs like UEFITool do a very good job, but we need to do something super-lightweight (and paranoid) at every system boot as part of the HSI tests. We only really want to stream a few kBs of SPI contents, not MBs as it’s actually quite slow and we only need a few hundred bytes to analyze.

In Fedora 40 all the kernel parts are in place to actually get the image from userspace in a sane way. It’s a 100% read-only interface, so don’t panic about bricking your system. This is currently Intel-only — AMD wasn’t super-keen on allowing userspace read access to the SPI, even as root — even though it’s the same data you can get with a $2 SPI programmer and 30 seconds with a Pomona clip.

Intel laptop and servers should both have an Intel PCI SPI controller — but some OEMs manually hide it for dubious reasons — and if that’s the case there’s nothing we can do I’m afraid.

You can help the fwupd project by contributing test firmware we can use to verify we parse it correctly, and to prevent regressions in the future. Please follow these steps only if:

  1. You have an Intel CPU laptop, desktop or server machine
  2. You’re running Fedora 39, (no idea on other distros, but you’ll need at least CONFIG_MTD_SPI_NOR, CONFIG_SPI_INTEL_PCI and CONFIG_SPI_MEM to be enabled in the kernel)
  3. You’re comfortable installing and removing a kernel on the command line
  4. There’s not already a test image for the same model provided by someone else
  5. You are okay with uploading your SPI contents to the internet
  6. You’re running the OEM-provided firmware, and not something like coreboot
  7. You’re aware that the firmware image we generate may have an encrypted version of your BIOS supervisor password (if set) and also all of the EFI attribute keys you’ve manually set, or that have been set by the various crash reporting programs.
  8. The machine is not a secure production system or a machine you don’t actually own.

Okay, lets get started:

sudo dnf update kernel --releasever 40

Then reboot into the new kernel, manually selecting the fc40 entry on the grub menu if required. We can check that the Intel SPI controller is visible.

$ cat /sys/class/mtd/mtd0/name 
BIOS

Assuming it’s indeed BIOS and not some other random system MTD device, lets continue.

$ sudo cat /dev/mtd0 > lenovo-p1-gen4.bin

The filename should be lowercase, have no spaces, and identify the machine you’re using — using the SKU if that’s easier.

Then we want to compress it (as it will have a lot of 0xFF padding bytes) and encrypt it (otherwise github will get most upset that you’re attaching something containing “binary code”):

zip lenovo-p1-gen4.zip lenovo-p1-gen4.bin -e
Enter password: fwupd
Verify password: fwupd

It’s easier if you use the password of “fwupd” (lowercase, no quotes) but if you’d rather send the image with a custom password just get the password to me somehow. Email, mastodon DM, carrier pigeon, whatever.

If you’re happy sharing the image, can you please create an issue and then attach the zip file and wait for me to download the file and close the issue. I also promise that I’m only using the provided images for testing fwupd IFD parsing, rather than anything more scary.

NOTE: If you’re getting a permission error (even running with sudo) you’re probably hitting a kernel MTD issue we’re trying to debug and fix. I wrote a python script that can be run as root to try to get each partition in turn.
If this script works, can you please also paste the output of that script into the submitted github issue.

Thanks!

100 Million Firmware Updates Supplied By The LVFS

The LVFS has now supplied over 100 million updates to Linux machines all around the globe. The true number is unknown, as we allow users to re-distribute updates without any kind of tracking, and also allow large companies or agencies to mirror the entire LVFS so the archive can be used offline. The true number of updates deployed will probably be a lot higher. Just 8 years ago Red Hat asked me to “make firmware updates work on Linux” and now we have a thriving set of projects that respect both your freedom and your privacy, and a growing ecosystem of hardware vendors who consider Linux users first class citizens. Every month we have two or three new vendors join; the logistical, security and most importantly commercial implications of not being “on the LVFS” are now too critical for IHVs, ODMs and OEMs to ignore.

Red Hat can certainly take a lot of credit for the undeniable success of LVFS and fwupd, as they have been paying my salary and pushing me forward over the last decade and more. Customer use of fwupd and LVFS is growing and growing – and planning for new fwupd/LVFS device support now happens months in advance to ensure fwupd is ready-to-go in long term support distributions like Red Hat Enterprise Linux. With infrastructure supplied and support paid for by the Linux Foundation, the LVFS really has a stable base that will be used for years to come.

As the number of devices supported by the LVFS goes up and up every week, and I’m glad that the community around fwupd is growing at the same pace as the popularity. Google and Collabora have also been amazing partners in encouraging and helping vendors to ship updates on the LVFS and supporting fwupd in ChromeOS — and their trust and support has been invaluable. I’m also glad the “side-projects” like “GNOME Firmware“, “Host Security ID“, “fwupd friendly firmware” and “uSWID as a SBoM format” also seem to be flourishing into independent projects in their own right.

Everybody is incredibly excited about the long term future of both fwupd and the LVFS and I’m looking forward to the next 100 million updates. A huge thank you to all that helped.

Introducing Passim

tl;dr: Passim is a local caching server that uses mDNS to advertise files by their SHA-256 hash. Named after the Latin word for “here, there and everywhere” it might save a lot of people a lot of money.

Introduction

Much of the software running on your computer that connects to other systems over the Internet needs to periodically download metadata or other information needed to perform other requests.

As part of running the passim/LVFS projects I’ve seen how download this “small” file once per 24h turns into tens of millions of requests per day — which is about ~10TB of bandwidth! Everybody downloads the same file from a CDN, and although a CDN is not super-expensive, it’s certainly not free. Everybody on your local network (perhaps dozens of users in an office) has to download the same 1MB blob of metadata from a CDN over a perhaps-non-free shared internet link.

What if we could download the file from the Internet CDN on one machine, and the next machine on the local network that needs it instead downloads it from the first machine? We could put a limit on the number of times it can be shared, and the maximum age so that we don’t store yesterdays metadata forever, and so that we don’t turn a ThinkPad X220 into a machine distributing 1Gb/s to every other machine in the office. We could cut the CDN traffic by at least one order of magnitude, but possibly much more. This is better for the person paying the cloud bill, the person paying for the internet connection, and the planet as a whole.

This is what Passim might be. You add automatically or manually add files to the daemon which stores them in /var/lib/passim/data with xattrs set on each file for the max-age and share-limit. When the file has been shared more than the share limit number of times, or is older than the max age it is deleted and not advertised to other clients.

The daemon then advertises the availability of the file as a mDNS service subtype and provides a tiny single-threaded HTTP v1.1 server that supplies the file over HTTPS using a self-signed certificate.

The file is sent when requested from a URL like https://192.168.1.1:27500/filename.xml.gz?sha256=the_hash_value – any file requested without the checksum will not be supplied. Although this is a chicken-and-egg problem where you don’t know the payload checksum until you’ve checked the remote server, this is solved using a tiny <100 byte request to the CDN for the payload checksum (or a .jcat file) and then the multi-megabyte (or multi-gigabyte!) payload can be found using mDNS. Using a Jcat file also means you know the PKCS#7/GPG signature of the thing you’re trying to request. Using a Metalink request would work as well I think.

Sharing Considerations

Here we’ve assuming your local network (aka LAN) is a nice and friendly place, without evil people trying to overwhelm your system or feed you fake files. Although we request files by their hash (and thus can detect tampering) and we hopefully also use a signature, it still uses resources to send a file over the network.

We’ll assume that any network with working mDNS (as implemented in Avahi) is good enough to get metadata from other peers. If Avahi is not running, or mDNS is turned off on the firewall then no files will be shared.

The cached index is available to localhost without any kind of authentication as a webpage on https://localhost:27500/.

Only processes running as UID 0 (a.k.a. root) can publish content to Passim. Before sharing everything, the effects of sharing can be subtle; if you download a security update for a Lenovo P1 Gen 3 laptop and share it with other laptops on your LAN — it also tells any attacker [with a list of all possible firmware updates] on your local network your laptop model and also that you’re running a system firmware that isn’t currently patched against the latest firmware bug.

My recommendation here is only to advertise files that are common to all machines. For instance:

  • AdBlocker metadata
  • Firmware update metadata
  • Remote metadata for update frameworks, e.g. apt-get/dnf etc.

Implementation Considerations

Any client MUST calculate the checksum of the supplied file and verify that it matches. There is no authentication or signing verification done so this step is non-optional. A malicious server could advertise the hash of firmware.xml.gz but actually supply evil-payload.exe — and you do not want that.

Comparisons

The obvious comparison to make is IPFS. I’ll try to make this as fair as possible, although I’m obviously somewhat biased.

IPFS

  • Existing project that’s existed for many years tested by many people
  • Allows sharing with other users not on your local network
  • Not packaged in any distributions and not trivial to install correctly
  • Requires a significant time to find resources
  • Does not prioritize local clients over remote clients
  • Requires a internet-to-IPFS “gateway” which cost me a lot of $$$ for a large number of files

Passim

  • New project that’s not even finished
  • Only allowed sharing with computers on your local network
  • Returns results within 2s

One concern we had specifically with IPFS for firmware were ITAR/EAR legal considerations. e.g. we couldn’t share firmware containing strong encryption with users in some countries — which is actually most of the firmware the LVFS distributes. From an ITAR/EAR point of view Passim would be compliant (as it only shares locally, presumably in the same country) and IPFS certainly is not.

There’s a longer README in the git repo. There’s also a test patch that wires up fwupd with libpassim although it’s not ready for merging. For instance, I think it’s perfectly safe to share metadata but not firmware or distro package payloads – but for some people downloading payloads on a cellular link might be exactly what they want – so it’ll be configurable. For reference Windows Update also shares content (not just metadata) so maybe I’m worrying about nothing, and doing a distro upgrade from the computer next to them is exactly what people need. Small steps perhaps.

Comments welcome.

EDIT 2023-08-22: Made changes to reflect that we went from HTTP 1.0 to HTTP 1.1 with TLS.

MSI and Insecure KMs

As some as you may know, MSI suffered a data breach which leaked a huge amount of source code, documentation and low-level firmware PRIVATE KEYS. This is super bad as it now allows anyone to sign a random firmware image and install it as an official MSI firmware. It’s even more super bad than that, as the certificates leaked seem to be the KeyManifest keys, which actually control the layer below SecureBoot, this little-documented and even less well understood thing called BootGuard. I’ll not overplay the impact here, but there is basically no firmware security on most modern MSI hardware now. We already detect the leaked test keys from Lenovo and notify the user via the HSI test failure and I think we should do the same thing for MSI devices too. I’ve not downloaded the leak for obvious reasons, and I don’t think the KM hashes would be easy to find either.

So what can you do to help? Do you have an MSI laptop or motherboard affected by the leak? The full list is here (source: Binarly) and if you have one of those machines I’d ask if you could follow the instructions below, run MEInfo and attach it to the discussion please.

As for how to get MEInfo, Intel doesn’t want to make it easy for us. The Intel CSME System Tools are all different binaries, and are seemingly all compiled one-by-one for each specific MEI generation — and available only from a semi-legitimate place unless you’re an OEM or ODM. Once you have the archive of tools you either have to work out what CSME revision you have (e.g. Ice Point is 13.0) or do what I do and extract all the versions and just keep running them until one works. e.g. choosing the wrong one will get you:

sudo ./CSME\ System\ Tools\ v13.50\ r3/MEInfo/LINUX64/MEInfo 
Intel (R) MEInfo Version: 13.50.15.1475
Copyright (C) 2005 - 2021, Intel Corporation. All rights reserved.
Error 621: Unsupported hardware platform. HW: Cometlake Platform. Supported HW: Jasplerlake Platform.

And choosing the right one will get you:

Intel (R) MEInfo Version: 14.1.60.1790
Copyright (C) 2005 - 2021, Intel Corporation. All rights reserved.

General FW Information
…
OEM Public Key Hash FPF                          2B4D5D79BD7EE3C192412A4501D88FB2066C853FF7B1060765395D671B15D30C

Now, how to access these hashes is what Intel keeps a secret, for no reason at all. I literally need to know what integer index to use when querying the HECI device. I’ve asked Intel, but I’ve been waiting since October 2022. For instance:

sudo strace -xx -s 4096  -e openat,read,write,close ./CSME\ System\ Tools\ v14.0.20+\ r20/MEInfo/LINUX64/MEInfo
…
write(3, "\x0a\x0a\x00\x00\x00\x23\x00\x40\x00\x00\x00\x00\x20\x00\x00\x00\x00", 17) = 17
read(3, "\x0a\x8a\x00\x00\x20\x00\x00\x00\x2b\x4d\x5d\x79\xbd\x7e\xe3\xc1\x92\x41\x2a\x45\x01\xd8\x8f\xb2\x06\x6c\x85\x3f\xf7\xb1\x06\x07\x65\x39\x5d\x67\x1b\x15\xd3\x0c", 4096) = 40
…

That contains all the information I need – the Comet Lake READ_FILE_EX ID is 0x40002300 and there’s a SHA256 hash that matches what the OEM Public Key Hash FPF console output said above. There are actually three accesses to get the same hash in three different places, so until I know why I’d like the entire output from MEInfo.

The information I need uploading to the bug is then just these two files:

sudo ./THE_CORRECT_PATH/MEInfo/LINUX64/MEInfo &> YOUR_GITHUB_USERNAME-meinfo.txt
sudo strace -xx -s 4096  -e openat,read,write,close ./THE_CORRECT_PATH/MEInfo/LINUX64/MEInfo &> YOUR_GITHUB_USERNAME-meinfo-strace.txt

If I need more info I’ll ask on the ticket. Thanks!

Speeding up getting firmware updates to end users

At the moment, when a vendor decides to support a new device using the LVFS in Linux or ChromeOS they have to do a few things:

  1. Write a plugin for fwupd that understands how to copy the firmware into the specific device
  2. Add a quirk entry into a file that matches a specific VID/PID or VEN/DEV to tell fwupd what plugin to load for this new device
  3. Actually ship that fwupd version in the next ChromeOS release, or convince Linux distros to rebase to the new version
  4. Get an account on the LVFS
  5. Upload some firmware, test it, then push it to end-users

Then the next device comes along a few months later. This time the vendor only has to update a quirk file with a new VID/PID, convince the distributor to ship the new fwupd update and then push the new firmware. Lets look at the timescales for each thing:

  1. Write plugin: Depends on programmer and GLib proficiency, but typically a few weeks
  2. Add quirk entry: 2 minutes to write, usually less than 12 hours for upstream review
  3. Ensure latest fwupd is shipped (~30 days for upstream, +~10 days for Fedora, +several months for Ubuntu, and +almost infinity for Debian stable
  4. Get LVFS account: 10 minutes for me to add, usually a few days to get legal clearance and to do vendor checks
  5. Upload firmware: Less than 5 minutes to write release notes and upload the file, and then stable remote is synced every 6 hours

So the slow part is step 3, and it’s slower than the others by several orders of magnitude – and it’s also the part that we have to do even when adding just one more VID/PID in the quirk file. We’ve ruled out shipping quirk entries in the metadata as it means devices don’t enumerate when offline (which is a good chunk of the fwupd userbase).

So what can we do? We already support two plugins that use the class code, rather than the exact VID/PID. For example, this DFU entry means “match any USB device with class 0xFE (application specific) and subclass 0x01” which means these kind of devices don’t need any updates (although, they still might need a quirk if they are non-complaint, for example needing Flags = detach-for-attach) – but in the most case they just work:

[USB\CLASS_FE&SUBCLASS_01]
Plugin = dfu

The same can be done for Fastboot devices, matching class 0xFF (vendor specific), subclass 0x42 (sic) and protocol 0x03, although the same caveat for non-compliant devices that need things like FastbootOperationDelay = 250:

[USB\CLASS_FF&SUBCLASS_42&PROT_03]
Plugin = fastboot

I think we should move more into this kind of “device opts into fwupd plugin” direction, so the obvious answer is to somehow have a registry of class/subclass/protocol values. The USB consortium defines a few (e.g. class 0xFE subclass 0x02 is an IRDA bridge – remember those!) but the base class 0xFF is completely unspecified. It doesn’t seem right to hijack it, and you only get 255 possible values – and sometimes you really do want the class/subclass to be the correct things, e.g. base class 0x10 is “Audio/Video Devices” for example.

There is something extra we can use, the Microsoft OS Descriptors which although somewhat proprietary are still mostly specified and supported in Linux. The simpler version 1 specification could be used, and although we could squeeze FWUPDPLU or FWUPDFLA as the CompatibleID, we couldn’t squeeze the plugin name (e.g. logitech-bulkcontroller) or the GUID (16 bytes) in an 8 byte Sub-compatibleID. I guess we could just number the plugins, or use half-a-GUID or something, but then it all starts to get somewhat hacky. Also, as a final nail-in-the-coffin, some non-compliant devices also don’t respond well (as in, they hang, and stop working…) when probing the string index of 0xEE – and so it’s also not without risk. If we have an “allowlist or denylist of devices that don’t support Microsoft OS Descriptors” (like Microsoft had to do) then we’re either back at updating the quirk file for each device added – which is what we wanted to avoid in the first place – or we risk regressions on end-user machines. So pass.

The version 2 specification is somewhat more helpful. It defines a new device capability that can return variable length properties, using a UUID as a key – although we do need to use the newish “BOS” descriptor. This is only available in devices using USB 2.1 and newer, although that’s probably the majority of devices in use these days. If I understand correctly, using a USB-C requires the device to support USB-2 and above, so that’s probably most new-design modern devices covered in reality.

Lets dig into this specification a bit: Some USB 2/3 devices already export a BOS “Binary Object Store” descriptor, which includes things like Wireless USB details, USB 2.0 extensions, SuperSpeed USB connection details and a Container ID. We could certainly hijack a new bDevCapabilityType which would allow us to store a binary blob (e.g. Plugin=foobarbaz\nFlags=QuirkValueHere\n) but that doesn’t seem super awesome to just use a random out-of-specification (looking at you fastboot…) value.

What the BOS descriptor does give us is the ability to use the platform capability descriptor, which is bDevCapabilityType=0x05 according to Microsoft OS Descriptors 2.0 Specification. For UUID D8DD60DF-4589-4CC7-9CD2-659D9E648A9F, this is identified as a structured blob of data Windows usually uses to put workarounds like the suspend mode of the device and that kind of thing.

The descriptor allows us to create a “descriptor set” which is really a posh way of saying “set these per-device registry keys when plugged in” which we could certainly (ab?)use for setting fwupd quirks and matching to plugins. It’s literally the REG_EXPAND_SZ, REG_DWORD things you can see in regedit.exe. Worst case you plug the device into Windows, and you get a few useless REG_SZ’s created of things like “fwupd.Plugin=logitech_hidpp” which Windows will ignore (or we hope so) and that we can use in fwupd to remove the need for most of the quirk files completely. Of course, you’ll still need a new enough fwupd that actually contains the device plugin update code, but we can’t do anything at all about that unless someone invents an OpenHardware time machine.

Can anybody see a problem? If so, tell me now as I’m going to prototype this next week. Of course, it needs vendor buy-in but I think the LVFS is at a point where we can tell them what to do. :) Comments welcome.

New fwupd 1.8.4 release

Today I tagged fwupd 1.8.4 which adds a few nice features and bug fixes. One specific enhancement I wanted to shout about is that we’re now supplying translated summary, description text and suggested actions for each HSI security failure. Two of the most common criticisms of the new GNOME security panel were “but what does it mean” and also “and what should I do” which ironically were fixed long before all the hubbub erupted. If you want to see both new bits of data then make sure you’re using gnome-control-center from the main branch and then install the new fwupd version – although if you’re stuck on a distro version of fwupd GNOME will still fallback to the single-line summary line as before.

One additional new feature that might accidentally fix another criticism with the panel is that fwupd now reads your system BIOS settings, and has the ability to change them if the user desires (and has authorization to do). This means we have to match the HSI failure (e.g. IOMMU disabled) with the BIOS setting, which isn’t standardized at all between vendors. We currently support this on modern Lenovo and Dell platforms via the firmware-attributes kernel interface; other vendors just have to add the kernel WMI bridge and it should mostly magically start to work.

As we now know what the failure is, what we need to change, and how to change it, we can actually ask the user if they want to change the setting automatically in the fwupdmgr security command line. This would allow us to add a “JFDI” action in the new GNOME device security panel rather than asking the user to manually change a firmware setting in the BIOS. We won’t do this for GNOME 43 as we need a few months of real-world testing to see what attributes are 100% safe to change on actual user systems, but for GNOME 44 the panel could be a whole lot more helpful than it is now.

A new tantalizing features then become available when using fwupd, as we can now read and change firmware settings. One is the ability to emulate the BIOS settings of another machine, which is fairly uninteresting to end users, but allows us the developers to reproduce bugs much easier now that we’re doing cleverer things. One more interesting deployment feature is that we also support reading out a file from /etc and applying those firmware settings at startup. This means you can now deploy a machine using something like Ansible, and have the firmware settings set up in the same way you set up the local machine state. There are lots of docs on how this all works and I encourage you to try this out and let us know how it goes. One caveat is that this doesn’t work if you have a password set on your BIOS settings, but we’re working on this for the next version.

Needless to say, please tell us about any problems with the new release. As always, comments welcome.