Yesterday, I released GNOME Games 3.35.90, so we’re in feature freeze for 3.36.0. Let’s take a look at the changes during the 3.35.x cycle:
Faster collection loading
For a long time, Games loaded collection asynchronously using Vala async functions. While it didn’t block the UI completely, it was still slow and caused frequent UI stalls until it loaded completely. In 3.36, collection loading uses a separate thread instead and is noticeably faster as a result, while the UI is perfectly smooth the whole time.
Cover loading has been moved to a thread as well, so both initial loading and scrolling while covers are loading should now be fast and smooth.
There’s still lots of room for improvement, but for the moment this improves things somewhat. ?️
Steam integration improvements
New-style vertical covers are now supported for Steam games.
Additionally, Steam tools such as Proton and Steamworks Common Redistributables don’t show up as games anymore.
Restarting games
Veerasamy Sevagen added a way to restart games without exiting them.
Other changes
Savestates were renamed to snapshots, and backup/restore was renamed to export/import.
Nightly and development builds have a new icon made with App Icon Preview.
Lists now look more consistent with each other on mobile:
On desktop, the first two lists turn into sidebars and look mostly the same way as before, while other lists will still have rounded corners and separators.
And finally, there has been a lot of refactoring and code cleanups.
And that’s it? If the changelog for 3.36 seems a bit short, it’s because it is. Due to unfortunate timing, a lot of work planned for 3.36 has been postponed to 3.38 so that we get more than a few weeks before feature freeze to finish and test it. ?️ So, let’s take a look at what retro-gtk 1.0 will bring.
retro-gtk 1.0
API break
First of all, retro-gtk 1.0 will not be API-compatible with the previous versions. In some cases the API is just simplified or made nicer to use (for example, RetroMainLoop
is gone, having been merged into RetroCore
), but a lot of changes have a very specific reason, more on that later. Since there will be more changes down the road, there’s no point in describing the API changes just yet, so instead let’s take a look at the bigger features:
More precise timing
Currently, retro-gtk uses g_timeout_add()
for doing the main loop in the game. It’s… Not very precise. Instead, we have a custom GSource
now implementing a more precise timer that allows an error of just a few microseconds instead of a few hundred.
Of course, it’s still not synced to the GTK refresh rate and so occasionally results in dropped frames. I’m not particularly happy with that, but that’s for later.
Audio playback improvements
With such an imprecise timing, it would be expected that the games would run at a wrong speed. And sometimes they do, but more often than not they still work fine. This happens because audio playback is blocking (each call waits until the previous audio has finished playing) and so the game can never run faster than its audio can play.
And as long as games output audio once per frame, it’s not an issue. However, libretro defines 2 callbacks for audio: one sends 2 samples, one for each channel, and the other one sends a batch of arbitrary length. And of course a lot of cores use the former.
At first retro-gtk just played samples immediately as they arrived. Of course, the cores using the former way were slowed down to a crawl because each call waited for the previous samples to play.
A year ago Adrien added a workaround after a report that a core was slowing down by batching up to 512 samples and playing them all at once. This solved the immediate problem, because it meant cores could do many consecutive calls sending samples and not get slowed down. However, outputting more than 512 samples at once still caused slowdowns (a lot of cores output 1000 or more samples per frame), and it also means that if a core sends a number of samples that isn’t power of 2 per frame, or if it sends a different amount of samples on every frame, there will be inconsistent slowdowns or jittery audio.
Initially I opted for making audio playback threaded and queueing any additional samples to resolve this. It worked, but led to some subtle desync issues if the game ran a little faster. It could be as subtle as 6 extra audio frames (12 samples) every 22 seconds (0.45% assuming the framerate is 60 fps), but it was still noticeable, so instead we now queue any samples sent during the frame and then play them once at the end of every frame. While this is not perfect (if the core sends a different number of samples, it’s still possible to get slowdowns after a “long” frame), it works and it solves slowdown in cores like PX68K.
Fast-forwarding that actually works
While it might be good that audio playback naturally prevents games from running too fast, it also means that the speed rate property (unused in Games currently) did nothing when trying to speed the game up rather than slow it down. To solve this, retro-gtk now resamples audio to match speed rate using libsamplerate. Now setting speed rate to values larger than 1 works, although with a “chipmunk effect”, as resampling also changes audio pitch. ?️
Running cores in a separate process
This is a big change, and it’s something I’ve started working on not long after 3.34.0 release, though at one point put it into a hiatus and then resumed. It means moving the game logic into a separate process and only having a proxy that sends input and receives output in the UI process. It’s similar to what web browsers have been doing long before, with separate web processes for each tab.
Why?
Many reasons. For one, crash resilience. If a core crashes, currently the whole app is taken down. When the core runs in its own process, instead Games can show an error and offer to restart the game.
It ensures there are no UI stalls due to a slow core. Running multiple cores at the same time will start a separate runner process for each core, which means the runner process side code can assume there’s only ever one core and can be simplified a lot. It improves performance when drawing the game into the widget is slow, like with fractional scaling, as drawing now happens asynchronously and so even if the UI process can’t keep up drawing everything in time, the game still runs at full speed. After a core is stopped, it’s cleanly uninitialized because its process exits. And it also allows us to implement some new features, but more on that later.
How?
The implementation is very much inspired by Christian Hergert’s GUADEC 2019 talk and by Builder’s git plugin and Sysprof. The runner process helper is a separate lightweight binary called retro-runner
that isn’t linked to retro-gtk or even to GTK.
For communication it uses both messaging and shared memory. Calls such as switching disks, saving/loading states, pausing and resuming use messaging, specifically D-Bus over a private socket connection. This makes it possible to use gdbus-codegen for generating boilerplate, though there’s still some boilerplate wrapping those calls to expose in a public API. It also allows to pass file descriptors, which are useful for…
…shared memory, specifically a memfd
that is passed to the other process and mmap
-ed on both sides, which is used for input and video.
This required some changes in how input works: libretro input is polling-based and retro-gtk follows that. However, in every single RetroController
implementation retro_controller_poll()
is no-op and instead the controller gets the input state on its own via signals, stores it and returns it on demand. With that in mind, poll()
was removed and instead controller implementations can now notify about their state changes. When that happens, their complete state is serialized and written into a shared memory block. Since the complete state for one controller, including mouse, pointer, lightgun, gamepad and keyboard state is under 1kb, it’s fast enough to just write the whole thing and not only differing parts. Then, when the core polls input state, a copy of the contents of shared memory is made and then any queries return values from that copy. This means the input is still fast and has no noticeable latency increase over running in the same process, and at the same time we follow the spec closer because input is now actually polling-based on the runner process side, and asking for input state before polling it does actually return the previous state.
Additionally, the call to set controller rumble state previously returned a boolean value to indicate whether it was succcessful or not. That was previously exposed in RetroController
as is, but calling it in sync just to get that value isn’t really an option. So instead there’s now a separate retro_controller_get_supports_rumble()
call and retro_controller_set_rumble_state()
returns nothing, allowing it to be called via D-Bus without degrading performance.
Video is passed similarly to input: the shared memory contains a framebuffer that runner process writes to and UI process reads from. Unfortunately, it’s not very efficient, because libretro gives us a preallocated framebuffer, resulting in a copy on the runner process side. It’s even worse on the UI process side, as there’s a copy (which is unnecessary, but I haven’t got to removing it yet) + uploading the texture to GPU to render it. And while libretro provides a callback to get an address of our framebuffer, retro-gtk doesn’t implement it yet and almost no core uses it. ?️ Thankfully, the resolution is usually pretty low (160×144, 240×160, 320×240, or sometimes 640×480), so this pipeline works, and it’s even noticeably faster than it was in a single process for me.
The problematic part is that actually telling UI process to redraw is currently done via a message. ?️ While it works, it’s extra latency that could be avoided. But there’s still time to change it.
OpenGL core support
One feature that we’ve wanted for a long time, but could never implement is supporting libretro cores that use hardware rendering. It should be simple, right? After all, we already use OpenGL in the widget that draws the game.
In the most basic form, the core sends us specs of a context and a framebuffer (API, version, framebuffer parameters such as whether it has depth/stencil buffer). We get a context from somewhere as asked and provide the framebuffer as needed. Additionally we’re expected to provide pointers to the GL functions by name.
The only quirk is that most cores want a compatibility profile context. And I couldn’t achieve it no matter what I did. Either it corrupted GTK state or eglCreateContext()
created a core profile context even though I asked for compatibility profile.
Subprocess comes to the rescue! With that I can easily have whatever context the core wants on the runner process side and it just works. Hence:
With OpenGL cores working, we can run games for a lot of platforms we don’t currently support, such as Nintendo 64 and Sega Dreamcast. While I can’t say yet which platforms will be supported in 3.38, I’m pretty sure Nintendo 64 will make it. Less sure about Dreamcast, because while Flycast core runs pretty well, it has some quirks. For example, if a state was saved without a controller plugged in, it won’t ever see the controller after restoring that state again, and for some reason Sonic Adventure defaulted to Japanese language. More importantly, Dreamcast game detection in Games isn’t very good right now and needs improvements before it’s useful.
Additionally, for Nintendo 64 I implemented a simple controller expansion switcher, because with Nintendo 64 you have to choose between reliably saving games and having rumble. ?️
At least we can automatically disable rumble for controllers that don’t support it, such as the keyboard assigned for player 2 in the screenshot. The UI needs more work too, for example, I’m not particularly happy about Player 1/Player 2 labels, but I needed something for testing. ?️
So, is it done?
Not yet. While OpenGL support works pretty well, there are other things to do, such as Vulkan support, as both of these cores can use it. That’s going to be interesting, because I have absolutely no experience working with Vulkan, unlike with OpenGL. Looking forward to learning it, and looking forward to an awesome 3.38 release, even though we still haven’t released 3.36 yet. ?️