Fedora Atomic Workstation: Almost fool-proof

Note: Fedora Atomic Workstation has recently been renamed to Team Silverblue. Learn more here.

I’ve had a little adventure with my Fedora Atomic Workstation this morning and almost missed a meeting because I couldn’t get to a desktop session.
I’ve been using the rawhide branch of Fedora Atomic Workstation to keep up to speed with the latest developments in Fedora. As is expected of rawhide,  recently, it would not get me to a login screen (much less a working desktop session). I’ve just booted back into my working image and ignored this for a few days.

The Adventure begins

But since it didn’t go away by itself, yesterday, I decided to see if I can debug it a bit. Looking at the journal for the last unsuccessful boot gave some hints:

gnome-shell[2934]: Failed to create backend: Failed to initialize renderer: Missing extension for GBM renderer: EGL_KHR_platform_gbm
gnome-session-binary[2920]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
gnome-session-binary[2920]: Unrecoverable failure in required component org.gnome.Shell.desktop

Poking the nearest graphics team member about this, I was asked to provide the output of eglinfo in this situation. Since I had an hour to spare before the meeting, I booted back into the broken image in runlevel 3, logged in on a vt, … and found that eglinfo is not in the OS image.

Well, thats easy enough to fix on an Atomic system, using package layering:

rpm-ostree install egl-utils

After that, I proceeded to reboot to get to the OS image with the newly added layer, and when I got to the boot prompt, I realized my mistake: rpm-ostree never replaces the booted image, since it (reasonably) assumes that the booted image is ‘working’.  But it only keeps two images around, so it had to replace the other one – which was the image which successfully boots to my desktop.

Now, at the boot prompt, I was faced with the choice between

  • the broken image
  • the broken image + egl-utils

Ugh. Not what I had hoped for. And my meeting starts in 50 minutes. Admittedly, this was entirely my fault. rpm-ostree behaved as it should and as documented. Since it is a snow day, I need to do the meeting from home and need a web browser for that.

So, what can be done? I remembered that ostree is ‘like git for binaries’, so there should be history, right? After some fiddling with the ostree commandline, I found the log command that shows me the history of my local repository. But sadly, the output was disappointing:

$ ostree log fedora/rawhide/x86_64/workstation
commit fa09fd6d2551a501bcd3670c84123a22e4c704ac30d9cb421fa76821716d8c20
ContentChecksum: 74ff34ccf6cc4b7554d6a8bb09591a42f489388ba986102f6726f9e662b06fcb
Date: 2018-03-20 10:27:42 +0000
Version: Rawhide.20180320.n.0
(no subject)

<< History beyond this commit not fetched >>

rpm-ostree defaults to only keeping the latest commit in the local repository, a bit like a shallow git clone. Thankfully, just like git, ostree is versatile, and bit more searching brought me to the pull command, and its –depth option:

# ostree pull --depth=5 onerepo fedora/rawhide/x86_64/workstation

Receiving metadata objects: 698/(estimating) 2.2 MB/s 23.7 MB

This command writes to the local repo in /sysroot/ostree/repo and thus needs to be run as root.

Now ostree log showed a few older commits. I had to bump the depth a few times to find the last working commit. Then, I made that commit available for booting into again, using the depoy command:

# ostree admin deploy 76723f34b8591434fd9ec0

where that hex string is a prefix of the commit ID of the last working commit.  This command also needs to be run as root.

Now a quick reboot, and… the boot loader menu had an entry for the working image again. I made it back to my desktop with 5 minutes to spare before the meeting. Phew!Update: Since you might be wondering, the output of eglinfo was:

eglinfo: eglInitialize failed

7 thoughts on “Fedora Atomic Workstation: Almost fool-proof”

    1. one way to do it is to unlock the image that you are booted in and install the package ‘live’ – that does not create a new deployment and will only last until the next reboot

  1. Any news on this? (Or a workaround?). The latest image (Rawhide.20180326.n.0) is still not working. I get the GDM “oops something went wrong” message. I think you get it too.

    Currently I’ve rebase to stable workstation 27.96.

    Has a bug report already been created somewhere? If not, what is the right place to create one?

  2. Great story.

    I belatedly started putting into practice your earlier posts… my first upgrade to rawhide last week was unbootable so I rolled back to F27.

    Just to clarify… you run gitg and Builder from Atomic and git and vim inside buildah. Do these operate simultaneously on the files in /srv?

Comments are closed.