I build a number of sites with Pintail, including projectmallard.org and yelp.io. Some of these sites I build and upload manually. Others are hooked up to continuous deployment. I’ve been using python-github-webhooks on my server, which is a very simple tool to receive GitHub notifications and do stuff in response. What I was doing in response was building sites with Pintail.
The problem with this approach is that GitHub wants endpoints to respond within 30 seconds. And although building Mallard with Pintail is fast, there are things you don’t want to block on. In particular, you don’t want network operations to hold things up. At the very least, building requires updating one git repository, and possibly more. The seemingly simple yelp.io configuration pulls in two more git repositories. (Yes, it’s that easy.)
So I needed a job queue. I don’t want to just background the build tasks, because then I could end up starting a new build before a previous build finished, and down that path lies madness. I looked into using AMQP queues or using a full-blown CI tool like Buildbot. But I wanted something simple that didn’t involve a lot of new software on my servers. (Side note: I’m building a handful of relatively small sites with fairly low traffic. If you’re doing more, go use a tool like Buildbot and ignore the rest of this post.)
What I finally decided to do was to manage a simple build queue with
mkfifo. I have a program that creates a FIFO and reads from it indefinitely, triggering builds when it receives data. Slightly stripped down version:
wdir=/var/pintail rm -f "$wdir/queue" mkfifo "$wdir/queue" chmod a+w "$wdir/queue" while read repo <"$wdir/queue"; do if [ "x$repo" = "xyelp.io" ]; then git='https://github.com/projectmallard/yelp.io.git' # Other sites get elif statements here. else continue fi if [ ! -d "$wdir/$repo" ]; then (cd "$wdir" && git clone "$git") else (cd "$wdir/$repo" && git pull -r) fi outdir="$repo"-$(date +%Y-%m-%d)-$(uuidgen) mkdir -p "/var/www/$outdir" (cd "$wdir/$repo" && LANG=en_US.utf-8 scl enable python33 -- pintail build -v -o "/var/www/$outdir" && cd "/var/www" && ln -sf "$outdir" "$repo".new && mv -T "$repo".new "$repo" ) 2>&1 >> "$wdir/$repo"-log done
Now the only thing my hook endpoints actually do is write a line to the FIFO. Importantly, the build process only looks for known strings in the FIFO, and ignores any other input. It doesn’t, for example, execute arbitrary commands placed in the FIFO. So the worst an attacker could do is trigger builds (potentially resulting in a DoS).
This script has one other trick: It uses symlinks to atomically update sites. The actual built site is in a unique directory named with the actual site name, the date, and a uuid. The actual directory pointed to by my httpd config files is a symlink. Overwriting a symlink with
mv -T is an atomic operation, so your site is never half-updated or half-broken. This is a trick I learned at a previous employer, where it was very very important that our very very large documentation site was updated exactly as our release announcement went out.
Lately I’ve been working on Pintail, a documentation site generator built on top of Mallard, Yelp, and the various other tools we’ve developed over the years. Pintail grew out of the tool that used to build projectmallard.org from Mallard sources. But it’s grown a lot to be able to handle general documentation sites. I want GNOME to be able to use Pintail for its documentation site. I want other projects to be able to use it too.
One of the more compelling features, and something many documentation site generators don’t handle, is that Pintail can pull in different git repositories for different directories. Small projects can get away with having all their docs in one repository. Large projects like GNOME can’t. Here’s a snippet of what the configuration for help.gnome.org might look like:
[/users/gnome-help/stable/] git_repository = git://git.gnome.org/gnome-user-docs git_branch = master git_directory = gnome-help/C/
There are two major features I hope to have ready soon. Both of them are available as Summer of Code projects. First, documentation sites obviously need search. But it’s not enough to just search the whole site. You want to be able to search within specific documents, or specific versions of documents. I’ve actually already got some indexing code in Pintail using Elasticsearch as a backend.
Second, Pintail needs to be able to handle localizations. I’ve put a lot of work into documentation internationalization over the years. It’s important, and everything I work on will continue to support it. I have some ideas on how this will work.
If you need to build a documentation site, give Pintail a try. I’m building a few sites with it already, but I’d love to get input from people with different needs.