I’ve spent a few days profiling GNOME Software on ARM, mostly for curiosity but also to help our friends at Endless. I’ve merged a few patches that make the existing --profile
code more useful to profile start up speed. Already there have been some big gains, over 200ms of startup time and 12Mb of RSS, but there’s plenty more that we want to fix to make GNOME Software run really nicely on resource constrained devices.
One of the biggest delays is constructing the search token cache at startup. This is where we look at all the fields of the .desktop files, the AppData files and the AppStream files and split them in a UTF8-sane way into search tokens, adding them into a big hash table after stemming them. We do it with 4 threads by default as it’s trivially parallelizable. With the search cache, when we search we just ask all the applications in the store “do you have this search term” and if so it gets added to the search results and ordered according to how good the match is. This takes 225ms on my super-fast Intel laptop (and much longer on ARM), and this happens automatically the very first time you search for anything in GNOME Software.
At the moment we add (for each locale, including fallbacks) the package name, the app ID, the app name, app single line description, the app keywords and the application long description. The latter is the multi-paragraph long description that’s typically prose. We use 90% of the time spent loading the token cache just splitting and adding the words in the description. As the description is prose, we have to ignore quite a few words e.g. “and”, “the”, “is” and “can” are some of the most frequent, useless words. Just the nature of the text itself (long non-technical prose) it doesn’t actually add many useful keywords to the search cache, and the ones that is does add are treated with such low priority other more important matches are ordered before them.
My proposal: continue to consume everything else for the search cache, and drop using the description. This means we start way quicker, use less memory, but it does require upstream actually adds some [localized] Keywords=foo;bar;baz
in either the desktop file or <keywords>
in the AppData file. At the moment most do, especially after I sent ~160 emails to the maintainers that didn’t have any defined keywords in the Fedora 25 Alpha, so I think it’s fairly safe at this point. Comments?
Are you proposing dropping just the long description, or the single-line one too?
If just the long one, I think that’s probably okay, and in fact will probably reduce false positives.
Like, for example, no more “Nomacs” when you search for Chrome. :)
(I don’t know how the search in gnome-software works or how it is integrated with any appstream software)
Or maybe use some library to build an efficient search index. Maybe something like lucene? And always update that when the local app data changes, not on gnome-software startup?
I presume gnome-software currently uses a custom approach to search? I think a lot could be won using well-designed, pre-existing software.
Isn’t Lucene Java?
It is. There is clucene but seems no longer maintained. I haven’t used it but SQLite has an extension for full text search with stemming algorithm, https://www.sqlite.org/fts3.html
Personally I would prefer to keep the search that returns the larger subset, so that less common search terms are still likely to match something.
Maybe a good compromise could be an “thorough search” checkbox which when selected give you the longer search?
Sorry for my English but GNOME SOFTWARE on NOT super-fast Intel laptop totaly sucks! Lags lags and lags. I cant find everything. I write “Code” or “code” and nothing was found, Code::Block lost. I want GNUsim8085 – i write “Gnu”, “GNU” or other words and nothing! Gnome software totaly suck. I am very serious about this!
SERIOUS. You must do something because Gnome software is not for use.
I can’t think of another word for it. sucks totaly. Please do something!
You could start by filing a bug with some hardware and details about the version you’re using.
Doesn’t sound like a good move. Especially expecting that for every package additional keywords be added to compensate for the loss of functionality is quite wishful thinking, as it’s not an easy task to pick the right keyword, especially when your background is software engineering.
I’d recommend to work around the slowness in a different way. Do you really need that information at startup before the main window appears? Can’t you load it in background? The user will probably take at least a couple of seconds before typing a meaningful search term. Even if it’s still not loaded, the search could wait until the index is finished building, just adding a few extra seconds on the first search.
Wouldn’t it be faster to cache this on disk on the first run, and then load that on every other run, invalidating the cache if any of the desktop files chaged?