Mallard – The Gist

Sidebars in yelp-xsl 3.30

It’s now easier to add sidebars to HTML output in yelp-xsl 3.30. When I finally landed the HTML modernization changes in 3.28, I added the ability to have sidebars without completely mucking up the layout. There’s a main element that’s a horizontal flexbox, and you could implement html.sidebar.custom to put stuff in main. But you’d have to do sizing and styling yourself, and you’d have to implement all the stuff you want in there.

I wanted to make sidebars easy in Pintail, so I started implementing some stock sidebars there. Then I realized I could move most of that work into yelp-xsl.

With 3.30, you can now set what you want to see in the sidebars using the html.sidebar.left and html.sidebar.right parameters. These are both space-separated lists of words, where each word is a sidebar component. In yelp-xsl, we have two components out of the box: contents to give you a table of contents for the whole document, and sections to give you a list of sections on the current page. You can also use the special blank token to force a sidebar to appear without actually adding anything to it.

More importantly, you can add your own components. So you can still make fully custom sidebar content while letting yelp-xsl do the rest of the work. For example, let’s say you wanted a left sidebar with a table of contents followed by a Google ad. You could set html.sidebar.left to "contents googlead". Then add a template to your extension stylesheet like this:

<xsl:template
     mode="html.sidebar.mode"
     match="token[. = 'googlead']">
  <!-- Put Google's stuff here -->
</xsl:template>

Pintail will add further sidebar components, such as one to switch between versions of a document, a language selector, and a search bar.

If you use yelp-build, you can pass the extension stylesheet with -x. The extension styleshet is also where you’d set the parameters when using yelp-build.

With Pintail, you’ll be able to set the parameters in your pintail.cfg file.

[pintail]
sidebar_left = contents
sidebar_right = search languages

What’s more, you’ll be able to set these on a per-directory basis. So, for example, on nightly builds you could add a warning message. In your extension stylesheet:

<xsl:template
     mode="html.sidebar.mode"
     match="token[. = 'nightly']">
  <p style="color: red">NIGHTLY BUILD!</p>
</xsl:template>

Then your pintail.cfg would look like this:

[pintail]
sidebar_left = contents
sidebar_right = search languages

[/path/to/nightly/]
sidebar_left = contents nightly

Simple Pintail Queues with mkfifo

I build a number of sites with Pintail, including projectmallard.org and yelp.io. Some of these sites I build and upload manually. Others are hooked up to continuous deployment. I’ve been using python-github-webhooks on my server, which is a very simple tool to receive GitHub notifications and do stuff in response. What I was doing in response was building sites with Pintail.

The problem with this approach is that GitHub wants endpoints to respond within 30 seconds. And although building Mallard with Pintail is fast, there are things you don’t want to block on. In particular, you don’t want network operations to hold things up. At the very least, building requires updating one git repository, and possibly more. The seemingly simple yelp.io configuration pulls in two more git repositories. (Yes, it’s that easy.)

So I needed a job queue. I don’t want to just background the build tasks, because then I could end up starting a new build before a previous build finished, and down that path lies madness. I looked into using AMQP queues or using a full-blown CI tool like Buildbot. But I wanted something simple that didn’t involve a lot of new software on my servers. (Side note: I’m building a handful of relatively small sites with fairly low traffic. If you’re doing more, go use a tool like Buildbot and ignore the rest of this post.)

What I finally decided to do was to manage a simple build queue with mkfifo. I have a program that creates a FIFO and reads from it indefinitely, triggering builds when it receives data. Slightly stripped down version:

wdir=/var/pintail

rm -f "$wdir/queue"
mkfifo "$wdir/queue"
chmod a+w "$wdir/queue"

while read repo <"$wdir/queue"; do
    if [ "x$repo" = "xyelp.io" ]; then
        git='https://github.com/projectmallard/yelp.io.git'
    # Other sites get elif statements here.
    else
        continue
    fi
    if [ ! -d "$wdir/$repo" ]; then
        (cd "$wdir" && git clone "$git")
    else
        (cd "$wdir/$repo" && git pull -r)
    fi

    outdir="$repo"-$(date +%Y-%m-%d)-$(uuidgen)
    mkdir -p "/var/www/$outdir"
    (cd "$wdir/$repo" &&
        LANG=en_US.utf-8 scl enable python33 -- pintail build -v -o "/var/www/$outdir" &&
        cd "/var/www" &&
        ln -sf "$outdir" "$repo".new &&
        mv -T "$repo".new "$repo"
    ) 2>&1 >> "$wdir/$repo"-log
done

Now the only thing my hook endpoints actually do is write a line to the FIFO. Importantly, the build process only looks for known strings in the FIFO, and ignores any other input. It doesn’t, for example, execute arbitrary commands placed in the FIFO. So the worst an attacker could do is trigger builds (potentially resulting in a DoS).

This script has one other trick: It uses symlinks to atomically update sites. The actual built site is in a unique directory named with the actual site name, the date, and a uuid. The actual directory pointed to by my httpd config files is a symlink. Overwriting a symlink with mv -T is an atomic operation, so your site is never half-updated or half-broken. This is a trick I learned at a previous employer, where it was very very important that our very very large documentation site was updated exactly as our release announcement went out.

Build documentation sites with Pintail

Lately I’ve been working on Pintail, a documentation site generator built on top of Mallard, Yelp, and the various other tools we’ve developed over the years. Pintail grew out of the tool that used to build projectmallard.org from Mallard sources. But it’s grown a lot to be able to handle general documentation sites. I want GNOME to be able to use Pintail for its documentation site. I want other projects to be able to use it too.

One of the more compelling features, and something many documentation site generators don’t handle, is that Pintail can pull in different git repositories for different directories. Small projects can get away with having all their docs in one repository. Large projects like GNOME can’t. Here’s a snippet of what the configuration for help.gnome.org might look like:

[/users/gnome-help/stable/]
git_repository = git://git.gnome.org/gnome-user-docs
git_branch = master
git_directory = gnome-help/C/

Pintail’s native format is Mallard, but you can add in support for other formats pretty easily. There’s Docbook support, for example, and I’d like to add AsciiDoc support using asciidoctor-mallard.

There are two major features I hope to have ready soon. Both of them are available as Summer of Code projects. First, documentation sites obviously need search. But it’s not enough to just search the whole site. You want to be able to search within specific documents, or specific versions of documents. I’ve actually already got some indexing code in Pintail using Elasticsearch as a backend.

Second, Pintail needs to be able to handle localizations. I’ve put a lot of work into documentation internationalization over the years. It’s important, and everything I work on will continue to support it. I have some ideas on how this will work.

If you need to build a documentation site, give Pintail a try. I’m building a few sites with it already, but I’d love to get input from people with different needs.

Help me improve Yelp’s conditional processing

Yelp has runtime conditional processing for both Mallard and DocBook, so you can show different content to users in different environments. For example:

<p if:test="platform:gnome-classic">We only show this in classic mode.</p>
<p if:test="platform:unity">We only show this in Unity.</p>
<p if:test="platform:fedora">We only show this in Fedora.</p>
<p if:test="platform:fedora-22">We only show this in Fedora 22.</p>

Read more about Yelp’s runtime conditional info and Mallard’s conditional tokens. To my knowledge, no other help system does this kind of automatic runtime conditional processing. After some conversations with Endless folks at GUADEC, I realized we’re still missing some cases. I want to make this better.

I’ve put together a short three-question survey. Please fill it out with information from each different kind of machine you have access to.

Ducktype: A Lightweight Syntax for Mallard

One of the projects I’ve been working on lately is Ducktype, a lightweight syntax for Mallard. Mallard has a lot of strengths. Its automatic linking mechanisms make content organization easier. Its focus on independent topics makes content re-use possible. Its revision information and other metadata allow you to do status tracking on large content pools. It has a well-defined extension mechanism that allows you to add new functionality and embed external vocabularies like TTML, SVG, and ITS.

XML is the backbone that makes all of this possible. But XML is also what slows adoption. There’s a growing trend towards using lightweight formats to make it easier to contribute. But while lightweight formats make easy things easy, they tend to fall over when dealing with the issues that XML-based vocabularies are designed to solve.

The idea for a lightweight syntax for Mallard has floated around for a couple years. I even spent some time trying to repurpose an existing lightweight format like reStructuredText or AsciiDoc, but none of them are able to carry the level of semantic information that Mallard needs.

Before going into details, let’s look at a Mallard page written in Ducktype:

= My First Topic
@link[guide >index]
@desc A short description of this page
@revision[date=2014-11-13 status=draft]

This is the first paragraph.
The paragraph continues here, but ends with the blank line below.

[steps]
* This is a steps list, common in Mallard.
* Without the [steps] declaration above, we'd get a normal bullet list.
  Indentation is significant, so this is still in the second item.
* Indentation is so significant that you can actually nest block elements.

  So this is a new paragraph still inside the third item.

  [note]
  And this is a paragraph in a note in the third item.

* You can also nest list items, or literally anything else.

  * This is a basic bullet list.
  * It is in the fourth item of the steps list.

This paragraph is outside the steps list.

One of the most distinguishing features is that, like Python, indentation matters. Indentation is how you stay inside a block element. Ducktype also allows you to do everything you’d do inside a Mallard <info> element, which is crucial to pretty much all of the compelling features of Mallard.

Ducktype is guided by a few design principles, just as Mallard was so many years ago:

It should be possible to do almost anything Mallard XML can do. You can arbitrarily nest block elements. You can have inline markup everywhere you need it, including in code blocks and in other inline markup. You can embed extensions and other vocabularies so that things like Mallard Conditionals and Mallard+TTML are possible. In fact, the only limitation I’ve yet encountered is that you can’t put attributes on page and section titles. This means that Ducktype is capable of serving as a non-XML syntax for virtually any XML vocabulary.
The most commonly used Mallard features should be easy to use. Mallard pages tend to be short with rich content and a fair amount of metadata. Steps lists are common. Semantic inline content is common. Linking mechanisms are, unsurprisingly, extremely common. Credits are common. Revision info is common. Licenses are nearly always done with XInclude.
There should be a minimal number of syntactical constructs. Most lightweight formats have shorthand, special-purpose syntax for everything. This makes it extremely difficult to support extension content without modifying the parser. And for any non-trivial content, it makes it difficult to remember which non-alphanumeric characters you have to escape when.
For extra special bonus points, it should be possible to extend the syntax for special purposes. Lightweight syntaxes are popular in code comments for API documentation, and in API documentation you want shorthand syntax for the things you reference most often. For an object-oriented language, that’s classes and methods. For XSLT, it’s templates and parameters. By not gobbling up all the special characters in the core syntax, we make it possible to add shorthand inline notations by just loading a plugin into the parser.

There’s some discussion on the mallard-list mailing list, starting in August. And there’s a preliminary Ducktype parser up on Gitorious. You can also get it from PyPI with `pip install duck`. If you’re interested in docs, or ducks, or anything of the sort, please join the conversation. I always like getting more input.

Mallard: State of the Duck

Mallard development has been a bit dormant lately. A few features have trickled in over the last year, but the backlog of things to improve has been steadily growing. But things are looking up. I’m nearly done moving projectmallard.org to my Linode, which will allow me to finally fix the broken mailing list archives. Then I can finalize some specifications and release actual packages for the schemas, thus ending this two-year yak-shaving exercise.

This post will highlight some of the back-burner Mallard projects that I hope to get traction on. To help the progress, I’m considering having a virtual quackfest where a handful of people work on specifications, tutorials, and implementations. You don’t have to be a programmer to get involved. Sometimes all we need are experienced Mallard users to give input and try new ideas. If you’re interested, leave a comment, or email me at shaunm at gnome dot org.

Here’s an overview of what I hope to address in the near future:

Mallard 1.1

Mallard 1.0 is finished, despite the admonition on the specification that it’s still a draft. We’ve gotten a lot of feedback, and seen what works and what doesn’t for extensions. (It works more than it doesn’t.) Mallard 1.1 will address that.

Support a tagging mechanism. The very rough Facets extension defines a tagging mechanism that it uses to match pages. But tagging has uses outside faceted navigation, so we should move this into the Mallard core.
Allow info elements in formal block elements. This allows you, for example, to provide credits for code snippets and videos. It’s also necessary to support the next bullet item:
Let formal block elements participate in automatic linking. People have asked to be able to link with a finer granularity than pages and sections. There are good implementation reasons why Mallard doesn’t allow arbitrary anchors, but I believe we can link to certain well-defined endpoints.
Allow sections IDs be optional. This is a common gotcha, and I think it’s a restriction we can relax.
Allow comments after sections. This is another common gotcha. Comments are just block content, and it doesn’t make much sense to put block content after sections. I think we can special-case comments.
Allow the links element to override the link role. This is a bit esoteric, but very useful in some cases.
Let informational links be one-way only. Sometimes it’s handy to opt out of automatic link reciprocation.
Provide a sort of static informational link type. This would allow you to assemble groups of links with no other semantics that you can still format with the links element.
Move hi out of experimental. Yelp has supported an experimental element to highlight some text for a very long time. It’s useful. It should be standard.
Allow link grouping for section links. I’m not sure on the best implementation for this yet, but the feature is useful.
Provide a generic div element with an optional block title. This is useful for extensions. We’d want to slightly redefine block fallback behavior to make this really useful. This is a somewhat backwards-incompatible change, but I think the risk is minimal.
Provide a way to do automatic links through tags. Sometimes you have a collection of pages that you want to link together. Mallard’s automatic links are one-to-one, so they make this case only marginally better. We may be able to hook into the tagging mechanism to do automatic links to all pages with a matching tag.
Allow multiple desc elements, with the exact same semantics as multiple informational titles.

Mallard UI

The Mallard UI extension is intended to hold extensions that add some user interactivity without additional semantics. Currently, expanders are fully defined and implemented. We have experimental implementations for media overlays and link thumbnails, and a plan for tabbed sections.

Mallard Sync

The Mallard Sync extension is planned to allow you to syncronize videos with text content. There are only rough ideas at this point. It will allow things like action links to seek in a video, showing and highlighting parts of the document as a video plays, and tables of contents for videos.

Mallard Conditionals

The Mallard Conditionals extension provides a runtime conditionals mechanism. Content can be conditionally shown based on things like the target platform, the reading environment, the supported Mallard features of the processing tool, and the language of the content. This is well-defined and fully implemented as it is. It just needs a thorough audit to finalize it.

There are other test token schemes that I’d like to work on:

Check the current page or section ID.
Check for page or sections IDs that exist in the document.
Check the tag values for the page.

All of these help with reuse. They allow you to XInclude standard content that can adapt itself to different pages and documents.

Mallard API

I did some work on an extension that allows you to format automatic links as API synopses when doing API documentation. I briefly mentioned this in my blog post API Docs on Mobile. This still needs a lot of work, and it needs input from people who are used to working with API documentation in different programming languages.

Mallard Glossaries

I blogged before about an extension to do automatic glossaries in Mallard. It’s been collecting dust for a while.

Faceted Navigation

I also blogged before about an extension to do faceted navigation in Mallard. It’s been collecting dust for an even longer while.

Mallard+TTML, Mallard+SVG, Mallard+MathML, Mallard+ITS

You can add W3C-standard formats like TTML, SVG, MathML, and ITS to your Mallard document. I’ve blogged about Mallard+TTML Video Captions, and there’s a tutorial on Mallard and SVG. These are all implemented, and they work extremely well thanks to Mallard’s well-defined extension mechanism. But they’d all be a lot better with a specification and a schema.

As you can see, there’s a lot to work on. Mallard was designed to be a platform from which we could explore new ideas for help. I think it’s proven itself in that regard. But as with any open source project, it needs an investment from people to keep driving it forward.

Mallard and Video Language Packs

I’ve recently been talking with Petr Kovar about how to make language packs for videos work well with Mallard. Petr, Jakub Steiner, and others have been working on a video-intensive “Getting Started” document for GNOME. Videos, of course, can take up a lot of disk space very quickly, and the problem is compounded when we localize into dozens of languages, as we do in GNOME.

I suggested making language packs for videos. So, for example, the Czech videos would be in a package called gnome-getting-started-cz. But you can’t expect people to use the software center to install the language pack on their own before viewing some introductory videos. Fortunately, we have a mechanism to install packages directly from a help document, using install action links.

<note> Install a language pack to view vidoes in your language. <link action="install:gnome-getting-started-cz" style="button">Install<link> </note>

This works nicely when viewed locally in Yelp, but it doesn’t work so well when the document is built to HTML for the web. We can use Mallard Conditionals to make the note only visible when install action links are available.

<if:if test="action:install" xmlns:if="http://projectmallard.org/if/1.0/"> <note> Install a language pack to view vidoes in your language. <link action="install:gnome-getting-started-cz" style="button">Install<link> </note> </if:if>

And while we’re at it, we really don’t want this note showing up when you view the original English source document, so we can refine the conditional with some language tokens:

<if:if test="action:install !lang:C !lang:en" xmlns:if="http://projectmallard.org/if/1.0/"> <note> Install a language pack to view vidoes in your language. <link action="install:gnome-getting-started-cz" style="button">Install<link> </note> </if:if>

This is almost right, except that we’ve hard-coded the package name for the Czech language pack. We want to be able to translate the package name in the action attribute. If you use itstool to translate your Mallard document with PO files, it turns out the package name will be in a translatable message, but embedded in markup in a way that translators won’t like:

msgid "<link action=\"install:getting-started-cz\" style=\"button\">Install</link>"

Worse yet, if you use Okapi to translate your document with XLIFF files, it won’t appear at all. Okapi and itstool are both based on the W3C Internationalization Tag Set (ITS), and this is a case where I think ITS really shines. We can use local overrides and embedded ITS rules to instruct these tools on exactly what to offer for translation.

For convenience, define these two namespace prefixes on the page element:

xmlns:mal="http://projectmallard.org/1.0/" xmlns:its="http://www.w3.org/2005/11/its"

To make segmentation clearer (especially for itstool), mark the link as non-translatable. This makes sure the action attribute doesn’t just get segmented with the rest of the containing paragraph. But we do want to translate the content of the link, so add a span that is translatable:

<link action="install:gnome-getting-started-cz" style="button" its:translate="no"> Install<link>

With itstool, you’ll now get the nicer no-markup message in your PO file:

msgid "Install"

But now we want to be able to translate the action attribute. Of course, we can’t add an its:translate attribute to the attribute. XML just doesn’t work that way. So we have to use embedded global rules to mark it is translatable. And while we’re at it, we can also provide a localization note for translators. Put this in the info element of the page:

<its:rules version="1.0"> <its:translateRule selector="//mal:link/@action" translate="yes"/> <its:locNoteRule selector="//mal:link/@action" locNoteType="description"> <its:locNote>Translate this to install:getting-started-LL, replacing LL with your locale, only if there is a video translation pack for your locale.</its:locNote> </its:locNoteRule> </its:rules>

You’ll now get this in your PO file:

#. Translate this to install:getting-started-LL, replacing LL with your #. locale, only if there is a video translation pack for your locale. msgid "install:getting-started-cz"

This is the kind of thing that’s possible when you have a dynamic help format, an integrated local help viewer, a run-time conditional processing system, and a translation process based on powerful industry standards. And it’s why I still love XML.

API Docs on Mobile

I’ve blogged before about mobile-friendly Mallard output. The HTML created by Yelp’s universal XSLT automatically adapts to small screen sizes in a number of ways, such as reducing extra borders and padding, dropping the number of columns used for certain link groups, and making links thumb-friendly. None of this is surprising to seasoned web developers, but for some reason we still don’t see a lot of it in technical communications and single-source publishing.

There’s a whole lot we can just do automatically when working with a working with a structured source format like Mallard or DocBook or DITA. Some things we can’t do without knowing the author’s intent, like removing non-essential screenshots. And for that, I’ve blogged before about using Mallard Conditionals to exclude images from mobile.

But what about those pesky code blocks? For decades, old farts have fought to keep lines of code under 80 characters. On common phones now, even 80 characters is too wide. You have more like 30 or 40 before you have to scroll horizontally.

Automatically reformatting code is probably outside the scope of good sense, but when API synopses are created dynamically, as with the API Mallard extension I’ve worked on, we can adjust the rendering fairly easily. Here’s a synopsis in a typical desktop browser:

And here’s the same synopsis with the line breaks and indentation dynamically adjusted for a mobile device through CSS:

Obviously, we can’t do much about a function name that’s just too long. But it’s fairly easy to make a synopsis which is at least somewhat readable on my phone. All of this is built into the tools and requires no extra work from authors.

Mallard Cheat Sheet

Mallard cheat sheet:

Mobile Mallard and Conditional Processing

Last time, I gave a demo of a Mallard document rendered in a way that adapts to handheld devices like phones. Because Mallard is not a presentational language, most of the formatting can be adjusted automatically, and authors don’t have to worry about anything. But sometimes, you really do want to change the content when viewing on a handheld device.

First, though, let’s look at some specific formatting differences between desktop and mobile. For each of the pages below, click the ‘Desktop’ and ‘Mobile’ links above the iframe to see the difference.

Desktop Help – The purely decorative hover previews are turned off for mobile. (Good luck hovering with your finger anyway.) Also, the three-column grid changes to a two-column grid.
Delete files and folders – The notes and step lists use all the available horizontal space. The automatic links at the bottom fill the whole page width as well.
Sound, video & pictures – The two-column link layout changes to a single column, and the link boxes span the entire page width, making them easier to tap.

But sometimes we add content that’s visually helpful on large screens, but cumbersome at small sizes. Things like screenshots are often nice to have, as long as they don’t get in the way. This is where conditional processing comes into play. I’ve been busy retooling Mallard conditional processing to allow things like this:

<media src="figures/shell-top-bar.png" if:test="!target:mobile"/>

The result is an image that’s only displayed in non-mobile reading environments. Switch between desktop and mobile on these pages to see it in action:

This isn’t limited to images, of course. You can use conditional processing on any block element or list item. What’s more, Mallard conditional processing allows full branching and fallback, giving you an easy way to display one thing for the desktop, and something else for mobile.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31