Fedora Workstation and the quest for stability and robustness

One of the things that makes me really happy in terms of the public reception to the Fedora Workstation is all the people calling out how stable and solid it is, as this was and is one of our big goals from the start of the Fedora Workstation effort.

From the start we wanted to bury the old idea of Fedora being only for people who didn’t mind risking a lot of instability in return for being on the so called bleeding edge. We also wanted to bury the related idea that by using Fedora you where basically alpha testing highly unstable and unfinished software for Red Hat Enterprise Linux. Yet at the same time we did want to preserve and build upon the idea that Fedora is a great operating system if you want to experience a lot of the latest and greatest new developments as they are happening. At first glance those two goals might seem a bit contradictory, but we decided that we should be able to do both by both adjusting our policies a bit and also by relying more on the Fedora retrace server as our bug fixing prioritization tool.

So in terms of policies the division of Fedora into a distinct server and workstation images and also the clearer separation of the spins, allowed us to start making decisions without worrying so much how they affected other usecases than our own. Because sometimes what from a user perspective seems like a bug or something being broken was non-workstation policy decisions getting in the way of the desktop behaving as expected, for instance firewall rules hindering basic desktop functions.

Secondly we incorporated a more careful approach into what and when we brought in new stuff, meaning we still try to keep on top of major upstream developments and be a leading edge system, but at the same time we do a little mental exercise for each decision to make sure its a decision that makes us ‘leading edge’ and not ‘bleeding edge’. And if we really want something in, but it isn’t 100% ready for prime time yet we do what we have done with Wayland or the GTK3 port of LibreOffice, we make it available as an option for early adopters, but we default to the safer choice while we work out the last wrinkles. (Btw, if you are interested in progress on Wayland, Kevin Martin, sent out an emailing with a link to a good Wayland development status just before the Holidays.

The final piece of the puzzle is regularly checking and identifying important bugs from the Fedora retrace server. Because like almost all developers we get way more bug reports than we realistically can ever address, so having the data from the retrace server allows us to easily identify the crashes that affect the most users, and just as importantly lets us filter out the bug reports that are likely caused by users installing weird stuff on their system. When we started using retrace various desktop modules tended to dominate the top 3 pages when sorting bugs based on count, but due to a continuous effort over the last few years desktop modules appearing in the top crashers list are few and far between and when they do appear we make sure to get fixes done quickly for them. So if you ever wonder if the data collected by these kind of systems are actually helping developers working on the software you use better, I can say that it is true for Fedora for sure.

That said I thought it could be interesting to explain a bit the challenges we have with tracking our progress in this area. So lets start by looking at a graph I pulled from the retrace server.
Looking at that graph one could say that it is clear that we have made great strides in improving system stability and I do believe that is the case, however the graphs doesn’t truly prove that inconclusively, they are just an indication. The reason they are not hard evidence is that there are a lot of things you need to take into consideration when reading them. First of all they are not adjusted based on total user population, which means that if you win or lose a lot of users between releases it can create an appearance of increased instability or decreased instability which is actually due to increase or decrease in user population, not in ‘how well does the system run on an individual users system’. So from what we see through other metrics our user population has been increasing since we launched the Fedora Workstation which means we shouldn’t be getting any ‘help’ in these graphs from a declining user population.

A second reason is that there are a lot of false positives being reported here, for instance we had an issue for a long while that the Intel graphics drivers generating a ton of this crash reports without it actually being crashes as such. So while they did represent bugs that should ideally be fixed they where not issues you might actually have noticed as a user of the system. So we spent some effort between Fedora Workstation 21 and Fedora Workstation 22 to reduce the amount of noise caused by this, which was an useful effort for us in terms of reducing noise in our retrace server, but from a user perspective it didn’t really make a tangible difference. And even with our efforts there are a still a lot of kernel issues showing up here which are not impacting users in a way that they are likely to perceive as the system being unstable.

A third item that might in a given release skewer the statistics is that we currently don’t differentiate between Fedora Workstation and spins in the statistics, which means that there might be issues caused by one of the spins generating a lot of bug reports against a module, but that might be a bug or an API usage issue that is not triggered by the Workstation edition and thus those items appearing or disappearing might affect the statistics, but as a user of the Fedora Workstation you would never experience it.

So keeping this is mind the retrace server is an important tool for us and one that at least gives us a decent indication of how we are doing with quality. But we can always do better so we will keep reviewing the reports we get through the ABRT and retrace systems and I also do strong recommend any application or library maintainers out there to look into what major issues are reported against their own modules.


#1 ycollet on 01.05.16 at 18:46

I put kde5-plasma-workspace in the plot engine, with fedora23 and … a flat line starting at 0.
Is it a joke ?
kde5 under fedora totally sucks. It’s plagued with bugs.
Wrt kde, Fedora 23 is the worst experience.

#2 uraeus on 01.05.16 at 20:17

Not exactly sure why that would be, could be they disabled ABRT under the KDE spin?

#3 Michael Catanzaro on 01.06.16 at 02:29

I’m not sure if they have disabled ABRT entirely. KDE upstream has their own automated crash reporter, which reports bugs back to KDE. My understanding is that ABRT ignores KDE apps and lets the KDE crash handler do its thing.

Something like that.

#4 Conan Kudo (ニール・ゴンパ) on 01.06.16 at 08:07

No, the opposite. ABRT was accidentally switched on for Fedora 23 KDE. Prior to that, it was using DrKonqi, which submits issues and crash data upstream to KDE directly.

#5 Jiri Eischmann on 01.05.16 at 23:08

We tried to track down most frequent crashes in KDE Plasma. It turned out that most of them were actually bugs in graphics drivers. Matthias Graessling wrote a blogpost about it: https://blog.martin-graesslin.com/blog/2015/10/some-thoughts-on-the-quality-of-plasma-5/

#6 Andreas Tunek on 01.07.16 at 06:52

Do you have an actual bug report for the Intel driver(s)? I haven’t found any.

#7 ycollet on 01.05.16 at 22:14

I am a little bit puzzled by the conclusion of the post.
How can you make this conclusion if not all the bug reports are considered by fedora in retrace.
Believe me: kde5 is awfuk.
LxQt: certainly some problems in the dependencies of the LxQt packages. The experience is not good at all.
Enlightenment: some important module (the network module) are not packaged so, enlightenment is not usable too.
I would like to tell to fedora that not everybody use Gnome. S, if they still focus on Gnome only, they certainly will loose some users …

#8 uraeus on 01.06.16 at 16:26

Well this post is about the Fedora Workstation specifically, which uses GNOME. The Fedora spins are run by separate teams with separate priorities, so they could be bug free or buggy as hell depending on their own maintainers teams size, focus and ability. So the KDE or Enlightenment spins are not variations of Fedora Workstation, they are separate entities built using many of the same components (rpms).

#9 FB on 01.05.16 at 22:53

maybe you should use an additive graph? The number of reports should be added among each release to compensate for the fact that there is one release in the beginning and 3 in the end.

#10 Jerry on 01.06.16 at 03:55

I’ve found Fedora 23 KDE spin to be excellent. I’ve tried most of the latest KDE Distros and Fedora nails it.

#11 Sébastien Wilmet on 01.06.16 at 09:29

The retrace server counts only the crashes. It doesn’t count the other bugs.

I’m myself convinced that great stability and robustness is achieved by doing code cleanups and refactorings. It’s amazing the number of bugs that I find just by reading a class and improving its code at the same time.

At some point, if a software has too many bugs, it’s maybe because the code was written too quickly. So at that point, if features continue to be added to the codebase, the risk is that the code becomes unmaintainable, with too many bugs and thus the users are frustrated and use another software. So a better solution in that case is to (almost) freeze the feature set, and improving internally the code, to have again something solid. Yes, targeting (almost) bug-free code takes maybe the double (or more) of the time that it can take to write quickly a feature.

At the end of the day, I think the pieces of software that survive a very long time are those that are very stable, with a very clean codebase.

#12 Alexandre Franke on 01.06.16 at 09:33

If you’re willing to pay for support, currently the option offered by Red Hat is RHEL. With these changes that you’re describing here, do you think we could soon buy support for Fedora Workstation? There are companies that would be ready to pay to get their bug fixed but for which require recent versions of the software they use.

#13 Donato Roque on 01.06.16 at 09:38

With your permission I would like to re-blog this.

#14 Balder Psang on 01.06.16 at 10:30

Being completely honest: Fedora crashed so much my system that I quit using it for Debian.

#15 ereshkigal on 01.07.16 at 13:15

I use Arch and it is rock solid.

#16 Pavel on 01.11.16 at 20:05

Not sure if retrace helps.
Bugs reported via abrt repeating again and again.
Bugs reported manually are not fixed from release to release (for example color calibration just does not work in Gnome at all since F21 or F20)

#17 uraeus on 01.12.16 at 21:25

Have you tried talking to Richard Hughes on IRC? If colour calibration doesn’t work he would be the person to talk to. You should be able to find him on #fedora-desktop on Gimpnet or #fedora-workstation on freenode, nickname Hughsie.