I build a number of sites with Pintail, including projectmallard.org and yelp.io. Some of these sites I build and upload manually. Others are hooked up to continuous deployment. I’ve been using python-github-webhooks on my server, which is a very simple tool to receive GitHub notifications and do stuff in response. What I was doing in response was building sites with Pintail.

The problem with this approach is that GitHub wants endpoints to respond within 30 seconds. And although building Mallard with Pintail is fast, there are things you don’t want to block on. In particular, you don’t want network operations to hold things up. At the very least, building requires updating one git repository, and possibly more. The seemingly simple yelp.io configuration pulls in two more git repositories. (Yes, it’s that easy.)

So I needed a job queue. I don’t want to just background the build tasks, because then I could end up starting a new build before a previous build finished, and down that path lies madness. I looked into using AMQP queues or using a full-blown CI tool like Buildbot. But I wanted something simple that didn’t involve a lot of new software on my servers. (Side note: I’m building a handful of relatively small sites with fairly low traffic. If you’re doing more, go use a tool like Buildbot and ignore the rest of this post.)

What I finally decided to do was to manage a simple build queue with mkfifo. I have a program that creates a FIFO and reads from it indefinitely, triggering builds when it receives data. Slightly stripped down version:

wdir=/var/pintail

rm -f "$wdir/queue"
mkfifo "$wdir/queue"
chmod a+w "$wdir/queue"

while read repo <"$wdir/queue"; do
    if [ "x$repo" = "xyelp.io" ]; then
        git='https://github.com/projectmallard/yelp.io.git'
    # Other sites get elif statements here.
    else
        continue
    fi
    if [ ! -d "$wdir/$repo" ]; then
        (cd "$wdir" && git clone "$git")
    else
        (cd "$wdir/$repo" && git pull -r)
    fi

    outdir="$repo"-$(date +%Y-%m-%d)-$(uuidgen)
    mkdir -p "/var/www/$outdir"
    (cd "$wdir/$repo" &&
        LANG=en_US.utf-8 scl enable python33 -- pintail build -v -o "/var/www/$outdir" &&
        cd "/var/www" &&
        ln -sf "$outdir" "$repo".new &&
        mv -T "$repo".new "$repo"
    ) 2>&1 >> "$wdir/$repo"-log
done   

Now the only thing my hook endpoints actually do is write a line to the FIFO. Importantly, the build process only looks for known strings in the FIFO, and ignores any other input. It doesn’t, for example, execute arbitrary commands placed in the FIFO. So the worst an attacker could do is trigger builds (potentially resulting in a DoS).

This script has one other trick: It uses symlinks to atomically update sites. The actual built site is in a unique directory named with the actual site name, the date, and a uuid. The actual directory pointed to by my httpd config files is a symlink. Overwriting a symlink with mv -T is an atomic operation, so your site is never half-updated or half-broken. This is a trick I learned at a previous employer, where it was very very important that our very very large documentation site was updated exactly as our release announcement went out.

Lately I’ve been working on Pintail, a documentation site generator built on top of Mallard, Yelp, and the various other tools we’ve developed over the years. Pintail grew out of the tool that used to build projectmallard.org from Mallard sources. But it’s grown a lot to be able to handle general documentation sites. I want GNOME to be able to use Pintail for its documentation site. I want other projects to be able to use it too.

One of the more compelling features, and something many documentation site generators don’t handle, is that Pintail can pull in different git repositories for different directories. Small projects can get away with having all their docs in one repository. Large projects like GNOME can’t. Here’s a snippet of what the configuration for help.gnome.org might look like:

[/users/gnome-help/stable/]
git_repository = git://git.gnome.org/gnome-user-docs
git_branch = master
git_directory = gnome-help/C/

Pintail’s native format is Mallard, but you can add in support for other formats pretty easily. There’s Docbook support, for example, and I’d like to add AsciiDoc support using asciidoctor-mallard.

There are two major features I hope to have ready soon. Both of them are available as Summer of Code projects. First, documentation sites obviously need search. But it’s not enough to just search the whole site. You want to be able to search within specific documents, or specific versions of documents. I’ve actually already got some indexing code in Pintail using Elasticsearch as a backend.

Second, Pintail needs to be able to handle localizations. I’ve put a lot of work into documentation internationalization over the years. It’s important, and everything I work on will continue to support it. I have some ideas on how this will work.

If you need to build a documentation site, give Pintail a try. I’m building a few sites with it already, but I’d love to get input from people with different needs.

Yelp has runtime conditional processing for both Mallard and DocBook, so you can show different content to users in different environments. For example:

<p if:test="platform:gnome-classic">We only show this in classic mode.</p>
<p if:test="platform:unity">We only show this in Unity.</p>
<p if:test="platform:fedora">We only show this in Fedora.</p>
<p if:test="platform:fedora-22">We only show this in Fedora 22.</p>

Read more about Yelp’s runtime conditional info and Mallard’s conditional tokens. To my knowledge, no other help system does this kind of automatic runtime conditional processing. After some conversations with Endless folks at GUADEC, I realized we’re still missing some cases. I want to make this better.

I’ve put together a short three-question survey. Please fill it out with information from each different kind of machine you have access to.

If you’re running GNOME Builder with an unpatched vte, you may have noticed the annoying notification control sequence in Builder’s embedded terminal. Christian Hergert told me there’s a vte patch that fixes this, but it hasn’t been merged upstream. He also told me the workaround is to unset PROMPT_COMMAND. I’m too lazy to do this every time I open Builder, so this is what’s now in my .bashrc file:

parent=$(ps -ocommand= -p $PPID | awk -F/ '{print $NF}' | awk '{print $1}')
if [ "x$parent" = "xgnome-builder" ]; then
    unset PROMPT_COMMAND
fi

The Open Help Conference & Sprints is happening again this year. Open Help is the only event focused on open source documentation and support. It features a two-day conference with a mix of presentations, demos, and open discussions. This format has proven very popular, and really helps people find real solutions to improve their documentation. The conference this year is September 26-27. (But don’t miss the amazing reception September 25!)

After the conference, Open Help hosts three days of multiple-team sprints. The sprints are September 28-30 this year. Open Help has hosted doc sprints for teams like GNOME, Mozilla, FreeBSD, Wikipedia, WordPress, OpenMRS, and WebPlatform.org. In many cases, Open Help was a team’s first exposure to having a sprint.

I wrote an article on opensource.com about the Open Help doc sprints, along with five tips on holding a successful sprint.

Documentation is important. This is the fifth year that Open Help will be helping open source teams create better documentation. But we need your help. Please spread the word about Open Help to any communities you’re involved with. Send mail to mailing lists. Tweet. Tell your friends. Get the word out however you can.

Open Help is only great because of the people who come.

When we first designed Mallard, we designed it around creating documents: non-linear collections of pages about a particular subject. Documents are manageable and maintainable, and we’re able to define all of Mallard’s automatic linking within the confines of a document.

If you wanted to publish a set of Mallard documents on the web, you could build each of them individually with a tool like yelp-build, then output some extra navigation pages to help people find the right document. But there was no simple way to create those extra pages. What’s more, you couldn’t link between documents except by using external href links. Mallard’s automatic links are confined to documents.

Enter Pintail. Pintail lets you build entire web sites from Mallard sources. Just lay out your pages in the directory structure you like, and let Pintail build the site for you. Put full Mallard documents in their own directories, then use Mallard to create the extra navigation pages between them. Better still, you can use an extended xref syntax to refer to pages in other directories. Just include the path to the target page with slashes, like so:

<link xref="/about/learn/svg"/>

This isn’t just a simple link. You can use this in topic links and seealso links and anywhere else that Mallard lets you put an xref attribute. Pintail makes Mallard’s automatic linking work across multiple documents.

Pintail is designed to allow other formats to be used, so you could use it to build all your documentation in an environment where not everything is in one format. It already supports Mallard Ducktype as well as XML. But Mallard is the primary format.

One of the really nice features is that it can pull it documents in other git repositories, so you don’t have to keep all your documentation in a single source tree. In fact, the site in your main repository might be little more than glue pages and the pintail.cfg file that specifies where all the actual documentation lives.

Pintail builds the projectmallard.org web site right now, as well as a few other random sites I maintain. I hope it turns out to be useful for heavy Mallard users like GNOME, Ubuntu, and Endless. And I hope it makes Mallard easier for others who are considering using it.

No software is ever finished, but here are some of the top things I plan to add soon:

  •  Page merging: Mallard allows pages to be dropped into a document and seamlessly integrated into the navigation. Sometimes you want to publish a document with pages pulled from other places. For example, GNOME generally wants to publish GNOME Help with the optional Getting Started video pages merged in.
  • Translations: Mallard was designed from day one to be translator-friendly, and itstool ships with ITS rules for Mallard. I just need to hook the pieces together.
  • Search: An extensive documentation site needs configurable search. You often want to restrict search within a single document. Also, some documents (or versions of documents) shouldn’t appear in global search results.

What would you like to see a Mallard site tool do?

Six months ago, I left my life as a freelance documentation consultant and joined Red Hat in the Open Source and Standards group. I mostly loved freelancing, and I wouldn’t have given it up for just any job. Red Hat brought me on to go into the various upstream open source projects that fuel our products and build up their communities and processes for documentation. The job description might as well have been “Pay Shaun to do what Shaun loves doing.”

I could never have predicted the incredibly fun challenges I’d face. I’ve been primarily concerned with oVirt, GlusterFS, and FeedHenry. But I’m also keeping a watchful eye on projects like OpenStack (along with Red Hat’s RDO offering), ManageIQ, Ceph, CentOS, and Fedora. The ecosystem of projects that Red Hat contributes to is vast and always growing, so there’s certainly no shortage of work to be done.

I’ve learned quite a bit about the different documentation workflows being used in the wild. The systems I helped build up for GNOME are fairly heavyweight compared to most open source projects (though certainly not the most heavyweight). Workflows using lightweight formats, GitHub, and continuous deployment are very compelling and help reduce the barrier to entry. On the other hand, they offer little for multiple versions, status tracking, reviews, and translations. People talk a lot about barriers to entry, but I also like to talk about barriers to retention. Sometimes making things for new contributors makes long-term maintenance a burden.

I’ve tried bridging this from both directions: working more rich metadata into existing lightweight processes, and making the editing and deployment story easier in the Mallard+Yelp ecosystem. Of course, I have to prioritize real work for deadlines, but it’s given me interesting new weekend challenges as well.

It’s been an exciting six months with an incredible team. I’m looking forward to the next six months.

Over the last few years, we’ve seen more and more open source projects transition to a Creative Commons license for their documentation. Specifically, most projects tend to use some version of CC-BY-SA. There are some projects that use a permissive code license like Apache or MIT for documentation, and certainly still some that use the GFDL. But for the most part, the trend has been toward CC-BY-SA.

This is a good thing. Creative Commons has been at the forefront of the open culture movement, which has had just as profound of an impact on our lives as the free software and open source movements before it. Using a Creative Commons license means that documentation writers have access to a wealth of CC-licensed images and videos and audio files. We can reuse icons and other imagery when creating network diagrams. We can use background music in our video demonstrations. And because so many projects are moving toward Creative Commons, we can all share each other’s work.

Sharing work is a two-way street if we all use the same license. If somebody uses a non-sharealike license, others can reuse their content, but they can’t reuse content from projects that use sharealike. So there’s a lot of network value to having everybody use CC-BY-SA.

But CC-BY-SA shares one serious flaw with the GFDL: Any code samples contained in the developer documentation is also licensed under the same license. This is true of any license, even permissive licenses like Apache or MIT, but with a copyleft licenses like CC-BY-SA or GFDL, it means the code can only be used in software projects under that same license. Of course, nobody writes code under CC-BY-SA or GFDL, so this presents a big problem.

We want people to be able to reuse code samples. That’s why we provide them. And we want to place as few barriers as possible to reusing them. Any sufficiently small code sample isn’t worth worrying about, but where’s the cutoff? Are the code samples in the Save Window State Howto sufficiently small? I don’t know. I’m not a lawyer. This is something we struggled with in GNOME, and it’s something other projects have realized is a problem as well. It recently came up on the OpenStack documentation mailing list, for example.

You can always put an exception on your license. You have a few choices. You could explicitly license your code samples under a permissive code license, or even CC0. GNOME has a standard license exception that reads “As a special exception, the copyright holders give you permission to copy, modify, and distribute the example code contained in this documentation under the terms of your choosing, without restriction.” This came from an honest-to-goodness lawyer, so I hope it’s OK.

But this still has a problem. GNOME is no longer using a stock Creative Commons license. Neither is anybody providing an exception to put code samples under a permissive code license. This means that two-way sharing is no longer a viable option. Anybody can take GNOME documentation and reuse it, even effectively uplicensing the code samples to CC-BY-SA. And GNOME can take any non-code prose from other CC-BY-SA content. But GNOME cannot reuse code samples from any project that doesn’t carry a compatible exception.

I’ve seen this in enough projects that I think it’s something Creative Commons should address directly. If there were a standard CC-BY-SA-CODE license that included a stock permissive exception for code samples, we could all switch to that and recommence sharing our developer documentation. Who can help make this happen?

Infographic: How Mallard helps cross-stream documentation workflows

One of the projects I’ve been working on lately is Ducktype, a lightweight syntax for Mallard. Mallard has a lot of strengths. Its automatic linking mechanisms make content organization easier. Its focus on independent topics makes content re-use possible. Its revision information and other metadata allow you to do status tracking on large content pools. It has a well-defined extension mechanism that allows you to add new functionality and embed external vocabularies like TTML, SVG, and ITS.

XML is the backbone that makes all of this possible. But XML is also what slows adoption. There’s a growing trend towards using lightweight formats to make it easier to contribute. But while lightweight formats make easy things easy, they tend to fall over when dealing with the issues that XML-based vocabularies are designed to solve.

The idea for a lightweight syntax for Mallard has floated around for a couple years. I even spent some time trying to repurpose an existing lightweight format like reStructuredText or AsciiDoc, but none of them are able to carry the level of semantic information that Mallard needs.

Before going into details, let’s look at a Mallard page written in Ducktype:

= My First Topic
@link[guide >index]
@desc A short description of this page
@revision[date=2014-11-13 status=draft]

This is the first paragraph.
The paragraph continues here, but ends with the blank line below.

[steps]
* This is a steps list, common in Mallard.
* Without the [steps] declaration above, we'd get a normal bullet list.
  Indentation is significant, so this is still in the second item.
* Indentation is so significant that you can actually nest block elements.

  So this is a new paragraph still inside the third item.

  [note]
  And this is a paragraph in a note in the third item.

* You can also nest list items, or literally anything else.

  * This is a basic bullet list.
  * It is in the fourth item of the steps list.

This paragraph is outside the steps list.

One of the most distinguishing features is that, like Python, indentation matters. Indentation is how you stay inside a block element. Ducktype also allows you to do everything you’d do inside a Mallard <info> element, which is crucial to pretty much all of the compelling features of Mallard.

Ducktype is guided by a few design principles, just as Mallard was so many years ago:

  1. It should be possible to do almost anything Mallard XML can do. You can arbitrarily nest block elements. You can have inline markup everywhere you need it, including in code blocks and in other inline markup. You can embed extensions and other vocabularies so that things like Mallard Conditionals and Mallard+TTML are possible. In fact, the only limitation I’ve yet encountered is that you can’t put attributes on page and section titles. This means that Ducktype is capable of serving as a non-XML syntax for virtually any XML vocabulary.
  2. The most commonly used Mallard features should be easy to use. Mallard pages tend to be short with rich content and a fair amount of metadata. Steps lists are common. Semantic inline content is common. Linking mechanisms are, unsurprisingly, extremely common. Credits are common. Revision info is common. Licenses are nearly always done with XInclude.
  3. There should be a minimal number of syntactical constructs. Most lightweight formats have shorthand, special-purpose syntax for everything. This makes it extremely difficult to support extension content without modifying the parser. And for any non-trivial content, it makes it difficult to remember which non-alphanumeric characters you have to escape when.
  4. For extra special bonus points, it should be possible to extend the syntax for special purposes. Lightweight syntaxes are popular in code comments for API documentation, and in API documentation you want shorthand syntax for the things you reference most often. For an object-oriented language, that’s classes and methods. For XSLT, it’s templates and parameters. By not gobbling up all the special characters in the core syntax, we make it possible to add shorthand inline notations by just loading a plugin into the parser.

There’s some discussion on the mallard-list mailing list, starting in August. And there’s a preliminary Ducktype parser up on Gitorious. You can also get it from PyPI with `pip install duck`. If you’re interested in docs, or ducks, or anything of the sort, please join the conversation. I always like getting more input.