Making ATA updates just work

The fwupd project has supported updating the microcode on ATA devices for about a month, and StarLabs is shipping firmware on the LVFS already. More are coming, but as part of the end-to-end testing with various deliberately-unnamed storage vendors we hit a thorny issue.

Most drives require the firmware updater to use the so-called 0xE mode, more helpfully called ATA_SUBCMD_MICROCODE_DOWNLOAD_CHUNKS in fwupd. This command transfers chunks of firmware to the device, and then the ATA hardware waits for a COMRESET before switching to the new firmware version. On most drives you can also use 0x3 mode which downloads the chunks and switches to the new firmware straight away using ATA RESET. As in, your drive currently providing your root filesystem disconnects from your running system and then reconnects with the new firmware version running. The kernel should be okay with that (and seems to work for me), but various people have advised us it would be a good way to cause accidental Bad Things™ to happen, which certainly seems plausible. Needlessly to say, we defaulted to the safe 0xE mode in fwupd 1.2.4 and thus require the user to reboot to switch to the new firmware version.

The issue we found is that about half of the ATA drive vendors require the drive to receive a COMRESET before switching to the new firmware. Depending on your main system firmware (and seemingly, the phase of the moon) you might only get a COMRESET when the device is initially powered on, rather than during reset. This means we’d have to tell the user to shutdown and then manually restart their system rather than just doing a system restart, which means various fwupd front ends like GNOME Software and KDE discover would need updating with new strings and code. This isn’t exactly trivial for enterprise distros like RHEL, and fwupd doesn’t know the capabilities of the front-end so can’t do anything sensible like hold back the update.

Additionally, the failure mode of installing a firmware update and then just restarting rather than shutting down would be the firmware version would be unchanged on the next boot, and fwupd would recognize this and mark the update as failed. The user would then also be prompted to update the firmware on the device that they thought they just updated. As my boss would say, “disappointing”.

Complexity to the rescue! There is one extra little-used mode in the ATA specification, called 0xF. This command causes the drive to immediately switch to the new firmware version, which as we’ve previously lamented might cause data loss. We can however, use this new command on shutdown when the filesystems have all been remounted read only. In fwupd git master (which is what will become version 1.2.6) we actually install a /usr/lib/systemd/system-shutdown/fwupd.shutdown script which checks the history database, and activates the new firmware if there is any activation required. This way it’ll always come back with the new firmware version when the user restarts, regardless of how the storage vendor interpreted the ATA specification.

I guess I should also thank Mario Limonciello and the storage team at Dell for all the help with this. We’ll hopefully have some more good news to share soon.

Published by

hughsie

Richard has over 10 years of experience developing open source software. He is the maintainer of GNOME Software, PackageKit, GNOME Packagekit, GNOME Power Manager, GNOME Color Manager, colord, and UPower and also contributes to many other projects and opensource standards. Richard has three main areas of interest on the free desktop, color management, package management, and power management. Richard graduated a few years ago from the University of Surrey with a Masters in Electronics Engineering. He now works for Red Hat in the desktop group, and also manages a company selling open source calibration equipment. Richard's outside interests include taking photos and eating good food.

4 thoughts on “Making ATA updates just work”

  1. /usr/lib/systemd/system-shutdown/ is not really the right place for programs like that. It’s called before the root file system is unmounted (the dir is located on the root fs after all). If you want to safely do stuff to the HDD after everything is unmounted and all LVM and other complex storage dismantled you have to do that in the initrd, i.e. as dracut module.

    1. In which case, https://www.freedesktop.org/software/systemd/man/systemd-halt.service.html is super misleading. If the root fs is not remounted RO at that stage, why have “It is necessary to have this code in a separate binary because otherwise rebooting after an upgrade might be broken — the running PID 1 could still depend on libraries which are not available any more, thus keeping the file system busy, which then cannot be re-mounted read-only.

Comments are closed.