I’ve spent a few days profiling GNOME Software on ARM, mostly for curiosity but also to help our friends at Endless. I’ve merged a few patches that make the existing --profile
code more useful to profile start up speed. Already there have been some big gains, over 200ms of startup time and 12Mb of RSS, but there’s plenty more that we want to fix to make GNOME Software run really nicely on resource constrained devices.
One of the biggest delays is constructing the search token cache at startup. This is where we look at all the fields of the .desktop files, the AppData files and the AppStream files and split them in a UTF8-sane way into search tokens, adding them into a big hash table after stemming them. We do it with 4 threads by default as it’s trivially parallelizable. With the search cache, when we search we just ask all the applications in the store “do you have this search term” and if so it gets added to the search results and ordered according to how good the match is. This takes 225ms on my super-fast Intel laptop (and much longer on ARM), and this happens automatically the very first time you search for anything in GNOME Software.
At the moment we add (for each locale, including fallbacks) the package name, the app ID, the app name, app single line description, the app keywords and the application long description. The latter is the multi-paragraph long description that’s typically prose. We use 90% of the time spent loading the token cache just splitting and adding the words in the description. As the description is prose, we have to ignore quite a few words e.g. “and”, “the”, “is” and “can” are some of the most frequent, useless words. Just the nature of the text itself (long non-technical prose) it doesn’t actually add many useful keywords to the search cache, and the ones that is does add are treated with such low priority other more important matches are ordered before them.
My proposal: continue to consume everything else for the search cache, and drop using the description. This means we start way quicker, use less memory, but it does require upstream actually adds some [localized] Keywords=foo;bar;baz
in either the desktop file or <keywords>
in the AppData file. At the moment most do, especially after I sent ~160 emails to the maintainers that didn’t have any defined keywords in the Fedora 25 Alpha, so I think it’s fairly safe at this point. Comments?