gnome-info-collect: What we learned

Last August, we ran a research exercise using a small tool called gnome-info-collect. The tool allowed GNOME users to anonymously send us non-sensitive data about how their systems were configured. The plan was to use that data to inform our design and development decisions. We got a fantastic response to our call for participation, with over 2,500 people uploading their data to the GNOME servers.

We’ve just finished the final parts of the analysis, so it’s time to share what we’ve learned. This post is on the long side, so you might want to get a brew on before you start reading!

Research limitations

The people who provided their data with gnome-info-collect were primarily recruited via GNOME’s media channels, including Discourse and Twitter. This means that the data we collected was on a fairly particular subset of GNOME’s users: namely, people who follow our social media channels, are interested enough to respond to our call for help, and are confident installing and running a command line tool.

The analysis in this post should therefore not be treated as being representative of the entire GNOME user base. This doesn’t mean that it’s invalid – just that it has limited validity to the group we collected data from.

It should also be noted that the data from gnome-info-collect is by no means perfect. We collected information on GNOME systems rather than individual users. While there were some basic measures to avoid double counting, they weren’t foolproof, and there was nothing to stop the same person from submitting multiple reports using different accounts or systems. We also have no way to know if the systems which we got data on were the main desktops used by the reporters.

Who responded?

In total, we received 2,560 responses to gnome-info-collect. Of the 2,560 responses, 43 were removed from the dataset, due to not being GNOME installations, or being a virtual machine.

Distro used

A little over half of the responses came from a Fedora installation. The other main distros were Arch and Ubuntu.

Distro Number of responses % of responses
Fedora 1376 54.69%
Arch 469 18.64%
Ubuntu 267 10.61%
Manjaro 140 5.56%
Other 78 3.10%
EndeavourOS 66 2.62%
Debian 44 1.75%
openSUSE 38 1.51%
Pop! 38 1.51%
Total 2516 100.00%

Hardware manufacturer
The data we got on the hardware manufacturer of each system was poor quality. Many systems didn’t accurately report their manufacturer, and the names were often inconsistently written. Of the manufacturers that we could identify, Lenovo was the most common, with Dell, ASUS, HP, MSI and Gigabyte making up the bulk of the other systems.

Manufacturer Responses % Valid Responses
Lenovo 516 23.54%
Dell 329 15.01%
ASUS 261 11.91%
HP 223 10.17%
MSI 213 9.72%
Gigabyte 211 9.63%
Acer 86 3.92%
Other 353 16.10%
Total valid responses 2192 100.00%

Desktop configuration

We collected data on a number of different aspects of desktop configuration. Each of these areas are relevant to ongoing areas of design and development work.

Workspaces

We collected data on both the “workspaces on primary” and “dynamic workspaces” settings. The former controls whether each workspace is only on the primary display, or whether it spans all displays, and the latter controls whether workspaces are automatically added and removed, or whether there is a fixed number of workspaces.

In both cases, the default setting was used by the vast majority of systems, though the number of systems where the default was changed was not insignificant either. It was more common for people to change workspaces on primary, as opposed to the dynamic workspaces option.

Enabled Disabled % Enabled % Disabled Total
Workspaces on primary 2078 439 82.56% 17.44% 2517
Dynamic workspaces 2262 255 89.87% 10.13% 2517

Sharing features

GNOME’s sharing settings include a variety of features, and we collected information on which ones were enabled. When looking at this, it is important to remember that an enabled feature is not necessarily actively used.

Remote login (SSH login) was enabled more than any other sharing feature, which suggests that the gnome-info-collect respondents were relatively technical users.

Activation of the other features was relatively low, with multimedia sharing being the lowest.

Sharing Feature Systems Enabled % Enabled
Remote login 527 20.95%
Remote desktop 248 9.85%
File sharing 160 6.36%
Multimedia sharing 108 4.29%

Online accounts

Around 55% of the responses had one or more online accounts set up. (Again, an enabled feature is not necessarily used.)

Number of Online Accounts Responses % Responses
0 1115 44.30%
≥1 1402 55.70%
Total responses 2517 100.00%

Google was the most common account type, followed by Nextcloud and Microsoft. Some of the account types had very little usage at all, with Foursquare, Facebook, Media Server, Flickr and Last.fm all being active on less than 1% of systems.

Account type Responses % Responses % Responses With ≥1 Accounts
Google 1056 41.95% 75.32%
Nextcloud 398 15.81% 28.39%
Microsoft 268 10.65% 19.12%
IMAP and SMTP 153 6.08% 10.91%
Fedora 114 4.53% 8.13%
Ubuntu Single Sign-On 70 2.78% 4.99%
Microsoft Exchange 55 2.19% 3.92%
Enterprise Login (Kerberos) 50 1.99% 3.57%
Last.fm 20 0.79% 1.43%
Flickr 15 0.60% 1.07%
Media Server 9 0.36% 0.64%
Facebook 4 0.16% 0.29%
Foursquare 2 0.08% 0.14%

 

Flatpak and Flathub

Flatpak and Flathub are both important to GNOME’s strategic direction, so it is useful to know the extent of their adoption. This adoption level is also relevant to the design of GNOME’s Software app.

Over 90% of systems had Flatpak installed.

Flatpak status Responses % Responses
Installed 2344 93.13%
Not installed 173 6.87%
Total 2517 100.00%

In total, 2102 systems had Flathub fully enabled, which is 84% of all reporting systems, and 97% of systems which had flatpak installed. (The Flathub filtered status refers to Fedora’s filtered version of Flathub. This contains very few apps, so it is more like having Flathub disabled than having it enabled.)

It would be interesting to analyse Flatpak and Flathub adoption across distros.

Default browser

The default browser data referred to which browser was currently set as the default. It therefore doesn’t give us direct information about how much each browser is used.

The following table gives the results for the nine most popular default browsers. This combined the different versions of each browser, such as nightlies and development versions.

Most distros use Firefox as the default browser, so it’s unsurprising that it came out top of the list. These numbers give an interesting insight into the extent to which users are switching to one of Firefox’s competitors.

Default Browser Responses % Responses
Firefox 1797 73.14%
Google Chrome 286 11.64%
Brave 117 4.76%
Web 49 1.99%
Vivaldi 47 1.91%
LibreWolf 44 1.79%
Chromium 42 1.71%
Junction 38 1.55%
Microsoft Edge 37 1.51%
Total 2457 100.00%

Shell Extensions

gnome-info-collect gathered data on which extensions were enabled on each reporting system. This potentially points to functionality that people feel is missing from GNOME Shell.

Extension usage levels

When analyzing extension usage, we removed any pre-installed extensions from the data, so that data only included extensions that had been manually installed.

The vast majority of systems – some 83% – had at least one enabled extension. Additionally, 40% had between 1 and 5 enabled extensions, meaning that the majority (around 60%) had 5 or less enabled extensions.

At the same time, a substantial minority of systems had a relatively high number of enabled extensions, with around 25% of systems having between 6 and 10.

Number of Manually Enabled Extensions Number of Responses % Responses % Responses With Enabled Extensions
0 421 16.84%
1-5 1058 42.32% 50.89%
6-10 635 25.40% 30.54%
11-18 341 13.64% 16.40%
19+ 45 1.80% 2.16%
Total 2500 100.00% 100.00%

Extension popularity

The data included 588 individual extensions that were enabled. When analysing the popularity of each extension, we grouped the extensions which had similar or identical features. So, for example, “appindicator support” includes all the various status icon extensions as well. The table below shows the 25 most common enabled extension types, after grouping them in this way. Some of the extensions are included as part of GNOME’s classic mode, and we didn’t have a way to filter out those extensions which were enabled due to the classic session.

Extension Enabled Systems % Systems
Appindicator support 1099 43.66%
Gsconnect 672 26.70%
User theme 666 26.46%
Dash to dock / panel 579 23.00%
Sound output chooser 576 22.88%
Blur my shell 530 21.06%
Clipboard manager 510 20.26%
Caffeine 445 17.68%
System monitor 346 13.75%
Just perfection desktop 318 12.63%
Drive menu 310 12.32%
Apps menu 308 12.24%
Place menus 276 10.97%
Openweather 242 9.61%
Bluetooth quick connect 239 9.50%
Night theme switcher 208 8.26%
Tiling assistant 184 7.31%
Launch new instance 180 7.15%
Rounded window corners 158 6.28%
Game mode 146 5.80%
Alphabetical app grid 146 5.80%
Burn my windows 140 5.56%
GNOME UI tune 116 4.61%
Auto move windows 99 3.93%
Desktop icons 98 3.89%
Background logo 2 0.08%

As can be seen, appindicator support was by far the most common extension type, with 44% of all reporting systems having it enabled. Gsconnect, user theme, dash to dock/panel, sound output chooser, blur my shell and clipboard managers were all enabled in over 20% of the responses.

Installed apps

Knowing which apps are installed was one of the most interesting and valuable aspects of the data. It was also one of the most challenging aspects to analyse, and required processing to remove duplicate and spurious entries from the data set. The data set is still by no means perfect, but it is good enough to draw some initial conclusions.

In general, we are interested in which apps get used, which apps people have a strong need for, plus which apps people really like. App installation does not directly indicate any of these things directly, and is a relatively poor indicator for measuring them. We therefore need to be careful when drawing conclusions from this part of the analysis.

Frequency distribution

The frequency distribution of installed apps is really interesting. The total number of installed apps was very high. Even after processing, the data contained over 11,000 unique app names. Within this very large number of installed apps, the 400 most common apps represented 87% of all that were installed. This bulk of popular apps was followed by a very long tail.

The number of apps and the length of the frequency distribution tail has undoubtedly been inflated by issues in the data, and more processing to improve the data quality would be helpful.

Popular apps

After removing the most obvious preinstalled apps from the data, the 20 most common installed apps were as follows:

 

App Installations % Systems
GIMP 1497 58.48%
VLC 1375 53.71%
Steam 1367 53.40%
htop 1184 46.25%
Dconf Editor 1108 43.28%
Extension Manager 984 38.44%
Inkscape 952 37.19%
Flatseal 942 36.80%
Discord 938 36.64%
Google Chrome 899 35.12%
Web 898 35.08%
Chromium 871 34.02%
Thunderbird 824 32.19%
GParted 795 31.05%
Wine 772 30.16%
OBS Studio 770 30.08%
Visual Studio Code 726 28.36%
Transmission 719 28.09%
Telegram 713 27.85%
Geary 672 26.25%

A longer list of the most common 110 installed apps is available separately.

Note that the removal of preinstalled apps from this lists was extremely rudimentary and the numbers in the list may represent some apps which are preinstalled by some distros.

The most common manually installed apps are a mixed bag of traditional Linux desktop apps, third-party proprietary apps, and newer GNOME apps. Examples of common apps in each of these categories include:

  • Traditional Linux desktop apps: GIMP, VLC, Inkscape, GParted, Transmission
  • Third party apps: Google Chrome, Steam, OBS Studio, NVIDIA Settings
  • Newer GNOME apps: Flatseal, To Do, Bottles, Sound Recorder, Builder

Conclusions

Overall, the data gives some strong hints about which features should be concentrated on by the GNOME project. It also provides evidence about which features shouldn’t be prioritised.

It needs to be remembered that, while we have evidence here about some of the decisions that some GNOME users are making, the data doesn’t give us much insight into why they are making the decisions that they are. For example, it would seem that people are installing the GIMP, but for what purpose? Likewise, while we know that people are enabling some features over others, the data doesn’t tell us how those features are working for them. Do people find online accounts to be useful? The data doesn’t tell us.

We therefore need to be very careful when making decisions based on the data that we have here. However, what we do have is a great basis for followup research which, when combined with these results, could be very powerful indeed.

The app installation picture is complex. On the one hand, it doesn’t look like things have changed very much in the past 10 years, with people continuing to install the GIMP, Wine, and GParted. On the other hand, we have 3rd party apps being widely used in a way that wasn’t possible in the past, and it’s exciting to see the popularity of new GNOME apps like Flatseal, To Do, Bottles, and Fragments.

The data on apps is also some of the most limited. We need data on which apps are being used, not just just which ones are installed. It would also be really helpful to have data on which apps people feel are essential for them, and it would be great to have demographic information as part of the dataset, so we can see whether there are different groups of users who are using different apps.

Methodological lessons

gnome-info-collect was the first data collection exercise of its kind that has been run by the GNOME project, and we learned a lot through the process. I’m hopeful that those lessons will be useful for subsequent research. I have notes on all that, which I’ll share at some point. For now, I just want to touch on some general points.

First, doing small standalone research exercises seems to be a great approach. It allowed us to ask research questions around our current interests, and then generate research questions for followup exercises. This allows an iterative learning process which is strongly connected to our day to day work, and which can combine different research methods to understand the data from different perspectives.

Second, the whole premise of gnome-info-collect was to do a quick and lean initiative. In reality, it turned out to be a lot more work than anticipated! This was largely due to it being the first exercise of its kind, and there not being a preexisting platform for gathering the data. However, I think that we also need to acknowledge that even lean research exercises can be a lot of work, particularly if you want to gather large amounts of data.

Finally, we discovered some major issues with the data that we can get from Linux systems. Perhaps unsurprisingly, there was a lot of inconsistency and a lack of standardisation. This required additional processing at the analysis stage, and makes automated analysis difficult. If we want to routinely collect information from GNOME systems, cleaning up the raw data would be a big help.

Outro

I’d like to take this opportunity to thank Vojtěch Staněk for all his work on gnome-info-collect. Vojtěch handled all the technical aspects of gnome-info-collect, from writing the code to processing the data, as well as helping with public outreach. He did a great job!

The gnome-info-collect data doesn’t include directly identifying information, or anything very sensitive. However, there are still privacy concerns around it. We are therefore going to be archiving the data in a restricted location, rather than publishing it in full.

However, if members of the GNOME project have a use for the data, access can be arranged, so just get in touch. We are also currently investigating options for making some of the data available in a way that mitigates any privacy risks.

9 thoughts on “gnome-info-collect: What we learned”

  1. Thanks for the info! Makes me sad that Epiphany i.e. “Web” has a low adaption rate, it provides a nice native UI and is rather quick. While it is my default browser for years it requires a lot memory when multiple tabs are open – already on startup. I will report it to upstream but wonder why nobody cares because two or three years ago Epiphanys memory footprint was low.

    Keep in mind that a low- and adaption rate is just a usage rate. A low-rate probably hints upon a topic which needs more effort. File-Sharing via Webdav is nice but I just turn it on when needs. Actually I want something like teleport [1] preinstalled and readily usable everywhere, what Apple did with AirDrop.

    [1] https://gitlab.gnome.org/jsparber/teleport

    1. Evidently people just install a bunch of apps that they don’t regularly use? There’s not really any other way to explain the default browser results (which show Google Chrome way ahead of Epiphany and Chromium) vs. the installed software results (which show all three almost equal).

      Also seems very safe to say the sample was extremely non-representative of GNOME users in general. Hard to believe both GIMP and Steam usage could be so high, but those are at least easier to explain than htop.

      P.S. Epiphany bug reports welcome. Use WebKit Bugzilla, WebKitGTK component for most bugs, or the Epiphany component on GNOME GitLab for user interface bugs.

      1. Personally, I tend to have a bunch of browsers installed, while still using Firefox as the main one. Chrome is useful in case a site doesn’t work with Firefox, and Web is there for design and testing.

        I wonder if htop is being pulled in as a dependency in some cases. I think I’ve seen the launcher appear in the past, without directly installing it.

      2. Speaking for myself I would love to use Gnome Web as my main browser. Actually I check whether it is usable for me after each GNOME release.

        The main thing which holds me back is the missing WebRTC support. Since COVID19 home office and online meetings become really important. My main Browser has to support it.

        What bothers me is that it is really hard to follow the progress in this area or find out if there is movement at all. I look regularly at the webkit bugzilla. But there are so many tickets regarding webrtc that it is not clear which to follow. Also searching the internet reveals only quite old blog posts about this topics. I would love if Epiphany would report regularly on the progress (if there is one)

  2. I think the shell extension results are the most interesting.

    I used to have the weather extension installed myself. I live in a place where the weather changes a lot and commute on bicycle, and having a little bit of information available next to the clock is helpful (extra warm clothes or not). But the constant shell API breakage got to me at last. I’m not sure if it’s real breakage, or it’s just a declared version breakage, but suddenly it doesn’t show up, and the UI when visiting the extension page on gnome.org usually isn’t able to fix the problem. I’ve been through many iterations of fiddling with it and eventually getting it working, but since about a year ago I just resigned.

    I had a similar experience with the funny special effects extension that lets your windows disintegrate. Slightly difficult to install, but working great once it was in. A couple of months later I want to show it to a visiting co-worker, but since I’ve been through a shell upgrade in the meantime, no cigar.

    It seems to me that if the burden put on the extension writers to always keep up is just too much. Perhaps duck typing combined with some sort of automated CI test could be used to weed out incompatible extensions instead of a dumb version check?

  3. Totally understand your pain on the app names/IDs. I’m not sure if you manually curated a series of overlapping AppIDs but there should be at least the semblence of a dataset with some of the AppData files from distros and places like flathub with the provides tag. (e.g. this example from filezilla https://github.com/flathub/org.filezillaproject.Filezilla/blob/master/org.filezillaproject.Filezilla.metainfo.xml#L157 should help you get the app-ids that this app is known as)

  4. Application collection is limited. It looks like the code is using Gio.AppInfo.get_all() which lists registered applications but this will ignore a lot of apps installed via snap or homebrew and ignores downloaded portable applications like AppImage.

  5. >When analyzing extension usage, we removed any pre-installed extensions from the data, so that data only included extensions that had been manually installed.

    So no comparisons at all between installed and pre-installed extensions ? I feel it would have been quite important to know which pre-installed extensions are massively disabled ? Some pre-installed extensions might have lower enable rate than some popular extensions.

Comments are closed.