My name is Matt Campbell, and I’m delighted to announce that I’m joining the GNOME accessibility team to develop a new accessibility architecture. After providing some brief background information on myself, I’ll describe what’s wrong with the current Linux desktop accessibility architecture, including a design flaw that has plagued assistive technology developers and users on multiple platforms, including GNOME, for decades. Then I’ll describe how two of the three current browser engines have solved this problem in their internal accessibility implementations, and discuss my proposal to extend this solution to a next-generation accessibility architecture for GNOME and other free desktops.
Introducing myself
While I’m new to the GNOME development community, I’m no stranger to accessibility. I’m visually impaired myself, and I’ve been working on accessibility in one form or another for more than 20 years. Among other things:
- I contributed to the community of blind Linux users from 1999 through 2001. I modified the ZipSlack mini-distro to include the Speakup console screen reader, developed the trplayer command-line front-end for RealPlayer, and helped several new users get started.
- In 2003 to 2004, I developed a talking browser based on the Mozilla Gecko engine; it ran on both Windows and Linux.
- Starting in 2004, I developed a Windows screen reader, called System Access, for Serotek (which has since been acquired by my current company, Pneuma Solutions).
- I later worked on the Windows accessibility team at Microsoft, where I contributed to the Narrator screen reader and the UI Automation API, from mid 2017 to late 2020. (Rest assured, the non-compete clause in my employment agreement with Microsoft expired long ago.)
- For the past two years, I have also been the lead developer of AccessKit, a cross-platform accessibility abstraction for GUI toolkits. My upcoming work on the GNOME accessibility architecture will build on the work I’ve been doing on AccessKit.
The problems we need to solve
The free desktop ecosystem has changed dramatically since the original GNOME accessibility team, led by Sun Microsystems, designed the original Assistive Technology Service Provider Interface (AT-SPI) in the early 2000s. Back then, per-application security sandboxing was, at best, a research project. X11 was the only free windowing system in widespread use, and it was taken for granted that each application would both know and control the position of each of its windows in global screen coordinates. Obviously, with the rise of Flatpak and Wayland, all of these things have changed, and the GNOME accessibility stack must adapt.
But in my opinion, AT-SPI also has the same fatal flaw as most other accessibility APIs, going back to the 1990s. The first programmatic accessibility API, Microsoft Active Accessibility (MSAA), was introduced in 1997. Sun implemented the Java Access Bridge (JAB) for Windows not long after. What MSAA, the JAB, and AT-SPI all have in common is that their performance is severely limited by the latency of multiple inter-process communication (IPC) round trips. The more recent UI Automation API (introduced in 2005) mitigated this problem somewhat with a caching system, as did AT-SPI, but that has never been a complete solution, especially when assistive technologies need to traverse text documents. For decades, those of us who have developed Windows screen readers have been so determined to work around this IPC bottleneck that we’ve relied, to varying degrees, on the ability to inject some of our code into application processes, so we can more efficiently fetch the information we need and do our own IPC. Needless to say, this approach has grave drawbacks for security and robustness, and it’s not an option on any platform other than Windows. We need a better solution.
The solution: Push-based accessibility
We can find such a solution in the internal accessibility architecture of some modern browsers, particularly Chromium and (more recently) Firefox. As you may know, these browsers have a multi-process architecture, where a sandboxed process, known as the content process or renderer process, renders web content and executes JavaScript. These sandboxed processes have all of the information needed to produce an accessibility tree, but for various reasons, it’s still optimal or even necessary to implement the platform accessibility APIs in the main, unsandboxed browser process. So, to prevent the multi-process architecture from further degrading browser performance for assistive technology users, these browsers internally implement what I’ll call a push architecture. There’s still IPC happening internally, but the renderer process initially pushes a complete serialized snapshot of an accessibility tree, followed by incremental tree updates. The platform accessibility API implementations in the main browser process can then respond immediately to queries using a local copy of the accessibility tree. This is in contrast with the pull-based platform accessibility APIs, where assistive technologies or other clients pull information about one node at a time, sometimes incurring IPC round trips for one property at a time. In terms of latency, the push-based approach is far more efficient. You can learn more in Jamie Teh’s blog post about Firefox’s Cache the World project.
Ever since I learned about Chromium’s internal accessibility architecture more than a decade ago, I have believed that assistive technologies would be more robust and responsive if the push-based approach were applied across the platform accessibility stack, all the way from the application to the assistive technology. If you’re a screen reader user, you have likely noticed that if an application becomes unresponsive, you can’t find out anything about what is currently in the application window, while, in a modern, composited windowing system, a sighted user can still see the last rendered frame. A push-based approach would ensure that the latest snapshot of the accessibility tree is likewise always available, in its entirety, to be queried by the assistive technology in any way that the AT developer wants to. And because an AT would have access to a local copy of the accessibility tree, it can quickly perform complex traversals of that tree, without having to go back and forth with the application to gather information. This is especially useful when implementing advanced commands for navigating web pages and other complex documents.
What’s not changing
Before I go further, I want to reassure readers that many of the fundamentals of accessibility are not changing with this new architecture. An accessible UI is still defined by a tree of nodes, each of which has a role, a bounding rectangle, and other properties. Many of these properties, such as name, value, and the various state flags, will be familiar to anyone who has already worked with AT-SPI, the GTK 4 accessibility API, or the legacy ATK. It’s true that we’ll have to add several new properties, especially for text nodes. However, I believe that most of the work that application and toolkit developers have already done to implement accessibility will still be applicable.
Risks and benefits
I’ve written a more detailed proposal for this new architecture, including a discussion of various risks. The risk that concerns me most at this point is that I’m not yet entirely sure how we’ll handle large, complex documents in non-web applications such as LibreOffice. I specifically mention non-web applications here because web applications, such as the various online office suites, are already limited in this respect by the performance of the browser itself, including the internal push-based accessibility implementations of Chromium and Firefox. My guess is that, with this new architecture, applications such as LibreOffice will need to present a virtualized accessible view of the document, similar to what web applications are already doing. We’ll need to make sure that we implement this without giving up features that assistive technology users have come to expect, particularly when it comes to efficiently navigating large documents. But I believe this is feasible.
I’d like to close with a couple of exciting possibilities that my proposal would enable, which I believe make it worth the risks. Let’s start with accessible screenshots. Anyone who uses a screen reader knows how common, and how frustrating, it is to come across screenshots with no useful alternate text (alt text). Even when an author has made an effort to provide useful alt text, it’s still just a flat string. Imagine how much more useful it would be to have access to the full content and structure of the information in the screenshot, as if you were accessing the application that the screenshot came from (assuming the app itself was accessible). With a push-based accessibility architecture, a screenshot tool can quickly grab a full snapshot of the accessibility tree from the window being captured. From there, I don’t think it would be too difficult to propose a way of including a serialized accessibility tree in the screenshot image file. Getting such a thing standardized might be more difficult, but the push architecture would at least eliminate a major technical barrier to accessible screenshots.
Taking this idea a step further, because the proposed push architecture includes incremental tree updates as well as full snapshots, it would also become feasible to implement accessibility in streaming screen-sharing applications. This obviously includes one-on-one remote desktop use cases; imagine extending VNC or RDP with the ability to push accessibility tree updates. But what’s more exciting to me is the potential to add accessibility in one-to-many use cases, such as the screen-sharing features found in online meeting spaces. And while this too might be difficult to standardize, one could even imagine including accessibility information in standard video container formats such as MP4 or WebM, making visual information accessible in everything from conference talks to product demo videos and online courses.
Conclusion
Too often, free desktop platforms struggle to keep up with their proprietary counterparts. That’s not a criticism; it’s just a fact that free desktop projects don’t have the resources of the proprietary giants. But here, we have an opportunity to leap ahead, by betting on a different accessibility architecture. The push approach has already been proven by two of the three major browser engines, and while there are risks in extending this approach outside of the browser context, I strongly believe that the potential benefits are too big to ignore.
Over the next year we will be experimenting with this new approach in a prototype, collect feedback from various stakeholders across the ecosystem, and hopefully build a new, stronger foundation for state-of-the-art accessibility in the free desktop world. If you have questions or ideas don’t hesitate to reach out or leave a comment!
Thanks for stepping up to lead this effort. Looking forward to trying the prototypes/proof of concept code.
This interests me greatly, as I have been looking into making Reaper for Linux accessible. Reaper uses a cross-platform library that is thankfully open source. This library is responsible for handling UI amongst other things for all supported Reaper platforms. On Linux, OpenGL is used to draw UI elements. I was going to start looking at Gtk4’s at-spi implementation as a starting point, however the potential of affecting Reaper’s performance when at-spi dbus IPC is involved has been a concern. A push based architecture Will hoefully be great for accessibility in performance and latency critical applications.
That’s great news, thank you!
Wow. That is some serious work. I did not even know, that screen readers would inject code while scrapping applications for text content.
Images in Libre Office documents can have a desciption. In an offline content, when it alwys quickly load, there is no use of it, with the exception of blind readers. Good point, I should pay more attention to that.
Glad to hear more people are working on stuff like this!
A push architecture may make some things better, but it may also make other things worse – for example, I contribute to the LibreOffice codebase, and one of the problems is that certain documents (e.g. containing vector images) can generate immense trees of accessibility objects. Transferring all of that data in one shot would generate large latency at document open time.
Also, incremental updates are really hard – most of the time, programs are likely going to just push most of the existing tree, which is also going to lead to pathological cases.
I’m not sure what the “right” answer is, or even if there is one. But I’m sure you will contribute something useful.
Yeah, you’ve identified the major downside of this approach. I don’t yet know what the best answer is, but I wanted to make it clear that I’m not ignoring this concern.
To clarify, are you saying that the vector images themselves contribute significantly to the size of the accessibility tree? If so, I’d like to see an example of this.
Hey Matt, I’m a (primarily web) accessibility specialist who is familiar with ATs such as JAWS, NVDA, Dragon, Zoomtext, to name a few and I’m dabbling with coding. First of all I’d like to know how i can help and contribute even if its just an extra ear or looking at documentation. I’ve been looking into how to get started in accessibility in linux for the longest time, but its poorly documented or hard to find where to start with it.
Secondly, in answer to your question above, its complicated, though it depends on the application. For example, lets start with an svg in a web browser. This may be identified in the accessibility tree as an image, and it may have its path nodes as children, however often for web browsing, these nodes are often not exposed as they carry no ‘additional’ information beyond the image (svg) as a whole which may have an accessible name umbrella’ing the whole thing such as “print” for an svg of a printer. (This is easily replicable in Google Chrome by visiting the accessibility panel in the chrome dev tools and highlighting an SVG in the DOM).
However in a spreadsheet or charting application, a subset of the vector nodes may make up a part of a bar graph or, it may be a vector drawing application such as libreoffice draw and so the user may need those path nodes exposed to be able to interact with them or to understand the finer context of the image or draw (in the case of a bar chart, i may need “car”, “lorry”, “bus”, etc exposed to me as distinct objects rather than just “bar chart” or in the case of the vector for drawing, i may want to move 7 of the 50 nodes left a little to make a larger nose from the full vector of a “person”) … all in all, the answer is… its complicated.
Short answer, if i refer to a family as “The Smiths”… sometimes thats all i need, but sometimes i need to refer to “Joe Smith” or “Barbara Smith”. It depends on the structure of the information and how each application is built.
Hey there!
Is there any way people can contribute to this? I’m in a bit of a predicament actually.
So, I recently began my startup journey to disrupt the human-computer interface in a way that’s more accessible and intuitive (I can go into more detail via email). In order to do this, I’m actually building my own little architecture to handle some pretty nifty stuff. I might actually be able to help solve some of your more technical pain points here, or help in development in whatever way possible.
The more I dig the more I’m realizing I’m not alone when it comes to the problems of dealing with IPC and code injection as a workaround (and how nice it would be to have push-based architecture on an OS level). We appear to be trying to solve the same problem on a technical level.
Is there any way I could get in contact with you or somebody in the GNOME accessibility team? Like I said, this new architecture has pretty big implications for what I’m trying to make, and I’d really like to understand a little bit more about what’s going on, where things are at right now, and how I could help so I can integrate this.
Thank you so much!
I think it’s a good news but what about braille keyboard on braille displays or a virtual braille keyboard will be a good welcome. Because of brltty xbrlapi doesn’t work on wayland.