Six months ago, I left my life as a freelance documentation consultant and joined Red Hat in the Open Source and Standards group. I mostly loved freelancing, and I wouldn’t have given it up for just any job. Red Hat brought me on to go into the various upstream open source projects that fuel our products and build up their communities and processes for documentation. The job description might as well have been “Pay Shaun to do what Shaun loves doing.”

I could never have predicted the incredibly fun challenges I’d face. I’ve been primarily concerned with oVirt, GlusterFS, and FeedHenry. But I’m also keeping a watchful eye on projects like OpenStack (along with Red Hat’s RDO offering), ManageIQ, Ceph, CentOS, and Fedora. The ecosystem of projects that Red Hat contributes to is vast and always growing, so there’s certainly no shortage of work to be done.

I’ve learned quite a bit about the different documentation workflows being used in the wild. The systems I helped build up for GNOME are fairly heavyweight compared to most open source projects (though certainly not the most heavyweight). Workflows using lightweight formats, GitHub, and continuous deployment are very compelling and help reduce the barrier to entry. On the other hand, they offer little for multiple versions, status tracking, reviews, and translations. People talk a lot about barriers to entry, but I also like to talk about barriers to retention. Sometimes making things for new contributors makes long-term maintenance a burden.

I’ve tried bridging this from both directions: working more rich metadata into existing lightweight processes, and making the editing and deployment story easier in the Mallard+Yelp ecosystem. Of course, I have to prioritize real work for deadlines, but it’s given me interesting new weekend challenges as well.

It’s been an exciting six months with an incredible team. I’m looking forward to the next six months.

Over the last few years, we’ve seen more and more open source projects transition to a Creative Commons license for their documentation. Specifically, most projects tend to use some version of CC-BY-SA. There are some projects that use a permissive code license like Apache or MIT for documentation, and certainly still some that use the GFDL. But for the most part, the trend has been toward CC-BY-SA.

This is a good thing. Creative Commons has been at the forefront of the open culture movement, which has had just as profound of an impact on our lives as the free software and open source movements before it. Using a Creative Commons license means that documentation writers have access to a wealth of CC-licensed images and videos and audio files. We can reuse icons and other imagery when creating network diagrams. We can use background music in our video demonstrations. And because so many projects are moving toward Creative Commons, we can all share each other’s work.

Sharing work is a two-way street if we all use the same license. If somebody uses a non-sharealike license, others can reuse their content, but they can’t reuse content from projects that use sharealike. So there’s a lot of network value to having everybody use CC-BY-SA.

But CC-BY-SA shares one serious flaw with the GFDL: Any code samples contained in the developer documentation is also licensed under the same license. This is true of any license, even permissive licenses like Apache or MIT, but with a copyleft licenses like CC-BY-SA or GFDL, it means the code can only be used in software projects under that same license. Of course, nobody writes code under CC-BY-SA or GFDL, so this presents a big problem.

We want people to be able to reuse code samples. That’s why we provide them. And we want to place as few barriers as possible to reusing them. Any sufficiently small code sample isn’t worth worrying about, but where’s the cutoff? Are the code samples in the Save Window State Howto sufficiently small? I don’t know. I’m not a lawyer. This is something we struggled with in GNOME, and it’s something other projects have realized is a problem as well. It recently came up on the OpenStack documentation mailing list, for example.

You can always put an exception on your license. You have a few choices. You could explicitly license your code samples under a permissive code license, or even CC0. GNOME has a standard license exception that reads “As a special exception, the copyright holders give you permission to copy, modify, and distribute the example code contained in this documentation under the terms of your choosing, without restriction.” This came from an honest-to-goodness lawyer, so I hope it’s OK.

But this still has a problem. GNOME is no longer using a stock Creative Commons license. Neither is anybody providing an exception to put code samples under a permissive code license. This means that two-way sharing is no longer a viable option. Anybody can take GNOME documentation and reuse it, even effectively uplicensing the code samples to CC-BY-SA. And GNOME can take any non-code prose from other CC-BY-SA content. But GNOME cannot reuse code samples from any project that doesn’t carry a compatible exception.

I’ve seen this in enough projects that I think it’s something Creative Commons should address directly. If there were a standard CC-BY-SA-CODE license that included a stock permissive exception for code samples, we could all switch to that and recommence sharing our developer documentation. Who can help make this happen?

Infographic: How Mallard helps cross-stream documentation workflows

One of the projects I’ve been working on lately is Ducktype, a lightweight syntax for Mallard. Mallard has a lot of strengths. Its automatic linking mechanisms make content organization easier. Its focus on independent topics makes content re-use possible. Its revision information and other metadata allow you to do status tracking on large content pools. It has a well-defined extension mechanism that allows you to add new functionality and embed external vocabularies like TTML, SVG, and ITS.

XML is the backbone that makes all of this possible. But XML is also what slows adoption. There’s a growing trend towards using lightweight formats to make it easier to contribute. But while lightweight formats make easy things easy, they tend to fall over when dealing with the issues that XML-based vocabularies are designed to solve.

The idea for a lightweight syntax for Mallard has floated around for a couple years. I even spent some time trying to repurpose an existing lightweight format like reStructuredText or AsciiDoc, but none of them are able to carry the level of semantic information that Mallard needs.

Before going into details, let’s look at a Mallard page written in Ducktype:

= My First Topic
@link[guide >index]
@desc A short description of this page
@revision[date=2014-11-13 status=draft]

This is the first paragraph.
The paragraph continues here, but ends with the blank line below.

[steps]
* This is a steps list, common in Mallard.
* Without the [steps] declaration above, we'd get a normal bullet list.
  Indentation is significant, so this is still in the second item.
* Indentation is so significant that you can actually nest block elements.

  So this is a new paragraph still inside the third item.

  [note]
  And this is a paragraph in a note in the third item.

* You can also nest list items, or literally anything else.

  * This is a basic bullet list.
  * It is in the fourth item of the steps list.

This paragraph is outside the steps list.

One of the most distinguishing features is that, like Python, indentation matters. Indentation is how you stay inside a block element. Ducktype also allows you to do everything you’d do inside a Mallard <info> element, which is crucial to pretty much all of the compelling features of Mallard.

Ducktype is guided by a few design principles, just as Mallard was so many years ago:

  1. It should be possible to do almost anything Mallard XML can do. You can arbitrarily nest block elements. You can have inline markup everywhere you need it, including in code blocks and in other inline markup. You can embed extensions and other vocabularies so that things like Mallard Conditionals and Mallard+TTML are possible. In fact, the only limitation I’ve yet encountered is that you can’t put attributes on page and section titles. This means that Ducktype is capable of serving as a non-XML syntax for virtually any XML vocabulary.
  2. The most commonly used Mallard features should be easy to use. Mallard pages tend to be short with rich content and a fair amount of metadata. Steps lists are common. Semantic inline content is common. Linking mechanisms are, unsurprisingly, extremely common. Credits are common. Revision info is common. Licenses are nearly always done with XInclude.
  3. There should be a minimal number of syntactical constructs. Most lightweight formats have shorthand, special-purpose syntax for everything. This makes it extremely difficult to support extension content without modifying the parser. And for any non-trivial content, it makes it difficult to remember which non-alphanumeric characters you have to escape when.
  4. For extra special bonus points, it should be possible to extend the syntax for special purposes. Lightweight syntaxes are popular in code comments for API documentation, and in API documentation you want shorthand syntax for the things you reference most often. For an object-oriented language, that’s classes and methods. For XSLT, it’s templates and parameters. By not gobbling up all the special characters in the core syntax, we make it possible to add shorthand inline notations by just loading a plugin into the parser.

There’s some discussion on the mallard-list mailing list, starting in August. And there’s a preliminary Ducktype parser up on Gitorious. You can also get it from PyPI with `pip install duck`. If you’re interested in docs, or ducks, or anything of the sort, please join the conversation. I always like getting more input.

Add this to the list of things I never expected to be doing: opening a grocery store.

At last year’s Open Help Conference, I gave a talk titled Community Lessons From IRL. I told the story of how I got involved in opening a grocery store, and what I’ve learned about community work when the community is your neighbors.

I live in Cincinnati, in a beautiful, historic, walkable neighborhood called Clifton. We pride ourselves on being able to walk to get everything we need. We have a hardware store, a pharmacy, and a florist. We have lots of great restaurants. We had a grocery store, but after generations of serving the people of Clifton, our neighborhood IGA closed its doors nearly four years ago.

The grocery store closing hurt our neighborhood. It hurt our way of life. Other shops saw their business decline. Quite a few even closed their doors. At restaurants and coffee houses and barber shops, all anybody talked about was the grocery store being closed. When will it reopen? Has anybody contacted Trader Joe’s/Whole Foods/Fresh Market? Somebody should do something.

“Somebody should do something” isn’t doing something.

If there’s one thing I’ve learned from over a decade of working in open source, it’s that things only get done when people get up and do them. Talk is cheap, whether it’s in online forums or in the barber shop. So a group of us got up and did something.

Last August, a concerned resident sent out a message that if anybody wanted to take action, she was hosting a gathering at her house. Sometimes just hosting a gathering at your house is all it takes to get the ball rolling. Out of that meeting came a team of people committed to bringing a full-service grocery store back to Clifton as a co-op, owned and controlled by the community.

Thus was born Clifton Market.

Clifton Market display in the window of the vacant IGA building

Clifton Market display in the window of the vacant IGA building

For the last 14 months, I’ve spent whatever free time I could muster trying to open a grocery store. Along with an ever-growing community of volunteers, I’ve surveyed the neighborhood, sold shares, created a business plan, talked to contractors, negotiated real estate, and learned far more about the grocery industry than I ever expected. In many ways, I’ve been well-served by my experience working with volunteer communities in GNOME and other projects. But a lot of things are different when the project is in your backyard staring you down each day.

Opening a grocery store costs money, and we’ve been working hard on raising the money through shares and owner loans. If you want to support our effort, you can buy a share too.

Passive Voice Day will again be observed Monday, April 28, 2014. This event is participated in by many each year. On Passive Voice Day, the passive voice is preferred in tweets, blogs, and casual conversation. Not only is this found to be amusing, but an opportunity is also created for people to be educated on writing well.

The hashtag #passivevoiceday should be used so that your passive voice sentences can be enjoyed by others.

If it’s not known how the passive voice is used, this information from Grammar Girl should be read.

The GNOME doc team has been having a doc sprint in sunny Norwich, England. I arrived a day late due to crazy snow in Cincinnati, but the team’s been hard at work all week. As usual, there’ve been plenty of feature requests for Mallard, which I’ve been trying to keep track of. I’ve tried to encourage people to join the Mallard mailing list and to file Mallard enhancement proposals. Kat asked for a feature in yelp-tools to check licenses of Mallard page files. I’ve added a “license” subcommand to yelp-check. This is in git master, and I’ll get docs up on the wiki after the next release.

I also wrote a new tutorial on Mallard conditionals. It’s hosted with the rest of the Mallard tutorials on projectmallard.org, and I hope we can adapt it to a chapter on conditionals in the Mallard book. In other Mallard news, Ryan’s been pushing me to resume my work on my non-XML Mallard markdown, so that might see some work soon.

We had our semi-annual discussion about the state of our developer docs and what we can do to fix them. As usual, a lot of problems were identified and a lot of ideas were tossed around, but we never seem to revamp things quite to the level I’d like. Maybe this time will be different.

Allan Day joined us for a few days. I enjoyed working with him on redesigning Yelp. Hopefully Yelp 3.12 will look more like a modern GNOME 3 app. Aside from the chrome of Yelp, Allan took a crack at designing a splash page for the desktop help that has a bit more visual appeal. I’ve been working on implementing this for the last two days. Everybody loves screenshots:

yelp-tiled-splash-5

Following our tradition of naming special page styles after our hackfests, this is currently accomplished with the “norwich” style on links elements, along with some uix:thumbs elements on the target pages. Petr’s been pushing me to get thumbnails out of experimental and into Mallard UI proper. Hopefully I can find time to carry through on this after the hackfest.

Thanks to the GNOME Foundation for sponsoring my travel, and to the University of East Anglia for their wonderful hospitality.

Sponsored by the GNOME Foundation

Earlier this week, the W3C released the Internationalization Tag Set (ITS) 2.0 as a recommendation. This is a big leap from ITS 1.0 in terms of functionality, and I’m proud to have played a small part in the development of it. Today, I released ITS Tool 2.0.0 with full support for ITS 2.0. Notably, this release supports:

  • Parameters in selectors, including the ability for users to override parameters on the command line.
  • Preserve Space, a data category that allows you to specify which elements are space-preserving. This was based in part on a similar extension data category from ITS Tool 1
  • External Resource, a data category that allows you to locate referenced resources like images and videos. This was based in part on a similar extension data category from ITS Tool 1
  • Locale Filter, a data category that allows you to exclude content from localized copies based on locale. This replaces the considerably more limited dropRule extension from ITS Tool 1
  • ID Value, a data category that allows you to specify potentially complex IDs for elements.

This release also includes a number of features beyond ITS 2.0 support, such as an option to preserve entity references, an option to load external DTDs, and built-in rules for DocBook 5. This is the biggest release of ITS Tool since it was first released. I’d appreciate people trying it out and reporting bugs.

I do have plans for more features, including:

  • An option to follow XIncludes, automatically processing any included files. This is distinct from just merging the XIncludes in that the files are handled individually, as if you had specified them each on the command line. This is really useful in setups that have a large pool of common content that gets XIncluded in different deliverables so that a single PO file can easily reflect all the translatable strings for one deliverable.

  • Support for some sort of readiness data category that can specify whether individual segments are ready for translation. This would put information in the PO file about whether each PO message is finalized yet. This is useful for partial or slushy freezes, where you want to notify translators of finished content, but you can’t commit to freezing all your deliverables.

    We discussed a data category for this in ITS 2.0, but it wasn’t ready in time. Other implementers are interested in this, so we will probably use a shared extension that works in other tools.

  • The ability to specify repeatable elements for multi-lingual documents. In ITS Tool 1.2.0, I added the ability to join multiple translations into a single multi-lingual file, such as those that GNOME now uses for AppData files. Unfortunately, you currently have no control over which elements are repeated with distinct xml:lang attributes. Right now, ITS Tool assumes that whatever elements it uses for segmentation are the repeatable elements. But in files like AppData files, you may want to segment on the p elements, but repeat the description elements.

    I tried to bend the ITS 2.0 Target Pointer data category to support this use case, but ultimately decided it was too different and would needless complicate the specification.

  • HTML support. ITS 2.0 officially supports HTML5, and specifies exactly how ITS information is mapped to HTML. It does not, however, require implementations to support HTML5. For now, ITS Tool is just an XML tool, but I’d like to add support for HTML5. The libxml2 HTML parser doesn’t cut it, unfortunately, so I need to find something that does, and preferably something that I can map to libxml2’s data structures so I can use the same behind-the-scenes logic.

If you’re interested in using ITS Tool for your XML document translation, get in touch. Leave a comment or email me at shaunm at gnome dot org. I’m always happy to help people set up better translation processes.

How do you decide what to write about? How do you organize what you’ve written? Often, your view will be shaped by the technology you’re used to. Mallard users will think of topics and guides. DITA users will turn to tasks and concepts. DocBook users will line up chapters and sections. All of these have their pros and cons, but there are three types of documents you need to stop writing.

README: Read you? I’ll read what I like, thankyouverymuch. What’s in this README file? Instructions for how to use the software? How to install it? Develop plugins for it? Start contributing to it? The answer to all of these is yes, and much more. I just don’t know what’s in there. All I know is that somebody thinks I should read it. Maybe.

TODO: I want to know what I can do with your software today, not tomorrow, and certainly not in your imagined tomorrow. By all means, keep a TODO list. Better yet, use an issue tracker. Software development is hard work, and we all need help keeping track of what we need to do. But don’t put it in your user documentation.

FAQ: I don’t care if my question is frequently asked. There are many ways you could organize information, but an FAQ isn’t a taxonomy. It’s a brain dump of answers to some questions somebody had. Worse, it’s often a brain dump of answers to questions the developers think you should ask. (Did anybody really ask how your project got its witty name?) Identify the valuable information in your FAQ and take the time to work it into the organizational structure of your documentation.

All of these are failures to identify the audience and organize information for them. A writer’s job doesn’t end with writing some words and just putting them somewhere. When writing and organizing, always think of where your reader is likely to look for the information you’re providing.

It has been decided that the third annual Passive Voice Day will be observed on April 26, 2013. Though previous years were observed April 27, it is thought that more participants can be found if Passive Voice Day is observed on a week day.

Passive Voice Day is observed by people around the world. The absurdity of the English language being tortured is enjoyed by these people. For one day, the passive voice is used exclusively in tweets, blogs, and casual conversation.

The hashtag #passivevoiceday should be used on Twitter and other social media, so that your passive voice sentences can be enjoyed by others.

Is it not known how the passive voice is used? Is a refresher needed? The information provided by Grammar Girl should be read.