XML Translations with ITS

When I first started doing GNOME documentation, our documents were translated manually, as a whole, and with no change-tracking. That is to say, they were never translated. Then Danilo came along and wrote xml2po, and I integrated it into the build utilities in gnome-doc-utils. Suddenly, all our XML documents could be translated with standard PO files. And we started seeing actual documentation translation.

Fast forward six years. We’re still using xml2po, and it’s serving us well. Various web-based translation editors and trackers have appeared that can handle XML documents. Some of them use xml2po underneath; some use the same concepts in their own implementation. Meanwhile, the W3C created ITS, a common vocabulary for specifying things that translators and translation tools might want to know about an XML format. But we’re not using it, and I don’t know anybody in the greater GNOME ecosystem who is.

Presenting itstool: an ITS-based tool for converting XML files to PO files and back again.

This is an experiment I started over the weekend. The idea is to have a general xml2po-like tool that contains no vocabulary-specific logic. All it knows is XML and ITS, and everything you need to know about a format is specified with an ITS rules files. Look at the Mallard ITS file as an example. The basics of how to handle Mallard are in 13 lines.

This has a lot of potential. For starters, you don’t need to patch a program or write a plug-in to support a new XML vocabulary. All you need to do is provide an ITS file. There’s a chance that the people who developed the vocabulary will already have ITS definitions in place.

What’s more, you can embed ITS attributes and rules directly into your XML document, extending or overriding global rules. This is huge. This is the “mark something as untranslatable” feature that translators want, provided by a W3C recommendation that other tools might actually understand as well.

For an example, take a look at the DocBook Element Reference for Mallard. If you convert this to a PO file using itstool, you’ll see a bunch of messages that look like this:

msgid "<code href=\"http://www.docbook.org/tdg/en/html/abbrev.html\">abbrev</code>"

Yawn. It’s some markup, and a URL, and the name of an XML element. Nobody needs to translate that, but there’s no way a general tool can know that. But the author knows, so the author can use the its:translate attribute to save the translators some work:

<td its:translate="no"><p>
<code href="http://www.docbook.org/tdg/en/html/abbrev.html">abbrev</code>

Problem solved. This one pesky message will no longer appear in the PO file. That’s a big step forward for us. Unfortunately, there are 417 of those on that page. That’s a lot of typing. Fortunately, ITS also provides a standard way to specify rules, and you can put those rules directly inside your document. For this page, I happen to know that the first column of every row is untranslatable. And I know there’s no row spanning that would cause the first td to be in anything other than the first column. So rather than type its:translate="no" 417 times, I can put what I know right in the XML, where itstool can find it.

<its:rules xmlns:its="http://www.w3.org/2005/11/its">
<its:translateRule translate="no" selector="//mal:tr/mal:td[1]"/>

We can just put this inside a Mallard info element. This is valid because Mallard allows any element from an external namespace inside the info element. And now itstool drops all 417 of those pesky messages from the PO file.

There are some bits that ITS doesn’t provide, and for those, we’ll need extension rules. Annoyingly, ITS doesn’t have a rule to specify space-preserving elements. I think the idea is that your DTD or XSD can specify this using xml:space. But I come from the RELAX NG school of thought, where validation is validation, and processing logic is something else. The two formats I care about most, DocBook and Mallard, are both RNG-based, so I had to create an extension rule for that. That’s done.

Then we have xml2po’s awesome ability to create messages for external files like images. We can’t translate the images using xml2po, of course. But xml2po can let translators know when images have changed, or when new images were added. That’s another extension rule. It’s not in yet, but I have a syntax for it, and I think it will be easy.

Finally, there’s translator credits. This one is harder, but I think it can still be specified in XML as an extension. I have a prototype of how the XML might look in the ITS files for DocBook and Mallard, but no working code yet.

Some things were easier than with xml2po’s approach, and some things were harder. On the whole, I think the ITS-based approach can take us further, and lines up more with standards that exist outside GNOME. I’m going to keep experimenting with this, and perhaps try to get in touch with other ITS folks. We’ll see where it goes.

Accessibility Hackfest

Earlier this month, I attended the GNOME accessibility hackfest at the AEGIS conference. My goal was to start planning the accessibility user help for GNOME 3.0. The documentation team often doesn’t understand the needs of users of accessible technologies, and understanding user needs is the first step to writing good help.

I got Joanie Diggs started working on Orca help, and Brian Nitz on Accerciser help. I led them through the basics of planning topic-oriented help and helped them revise their plans. It’s always interesting to teach people topic-oriented help with Mallard, and to see what they grasp quickly and what needs more explanation. It helps me develop my ability to teach people to write.

I also got a much clearer sense of how we should work accessibility information into our desktop and application help. The entire team helped me understand what types of information users might be looking for. For example, Joanie mentioned keyboard shortcut reference pages as something that will often be useful to users of accessible technologies.

This led us to a feature idea for Mallard and Yelp. Joanie wondered if it would be possible to aggregate the keyboard shortcut references from across the desktop. This is basically remixing Mallard pages into a new document, which is entirely doable. But in this case, we want to do it dynamically. And for that, we need a tagging system. I once blogged about tags for faceted navigation, and I think those would work for this kind of use as well. This probably not 3.0 material (unless Phil Bull asks for it, because Phil always gets the backburner features he asks for). But it’s a potentially powerful feature that’s made possible by Mallard’s design.

I also watched demos that gave me a better appreciation of the kinds of needs some people have. Tools like screen readers and magnifiers have high visibility, but there are so many other accessible technologies aimed at people with lots of different needs. All GNOME developers should have the chance to see this stuff first-hand.

On a personal note, I left my wallet in a taxi. Brian Nitz and Mario Sanchez Prada spent two hours with me at the hotel where I picked up the taxi, hoping that the same cabbie would come back there to wait for customers. He didn’t, but Mario heroically talked to every single cabbie that stopped there. End result: my wallet was recovered the next day, minus 30€. Thanks, Mario. You’re my hero.

Sponsored by the GNOME Foundation

Creative Commons Attribution 3.0 United States
This work by Shaun McCance is licensed under a Creative Commons Attribution 3.0 United States.