Pondering the Duck

If you haven’t read Project Mallard yet, read it now. Go ahead, I’ll wait.

Matthew Thomas pointed me to DITA, an all-singing topic-oriented format from IBM. There’s a lot of complexity there, but it’s worth looking at. Some resources:

Also interesting is that Microsoft is heading in a similar direction for Longhorn. What’s really interesting is that their content model looks an awful lot like DocBook. Some quick reads:

A lot of people are having pretty much the same idea. I suspect the success of sites like Wikipedia have had a profound impact on how we all think about such things.

What I need now is to assemble a team of people to sit down and flesh things out. I need people who are out there writing the tools, and who have a strong idea of what sorts of things are going to trip us up in the implementation phase. I need people who are out there writing the content, and who have a strong idea of what sort of content structure is actually needed. And I need people who have ties to the Greater XML Community, and who can smack me down when I reinvent too many wheels.

How Much DocBook, Internals

I was asked for the script I used to count element usage in our DocBook files, as posted here yesterday. I’ve got to be honest. I wrote it in Mathematica. Why? Well, I thought through the problem in my head, and I thought to myself “Golly, the Split function would be handy here.”

But hey, it’s not all that hard with sh, given the right tools. So I wrote up the script again in something everybody could use. It’s a simple sh script, using the XMLStarlet utility. Don’t have XMLStarlet on your machine yet? Go get it. Get it now. XMLStarlet is a godsend for any *nix geeks doing XML stuff. Learn it, use it, love it.

Here’s the script in plain ol’ sh:


rm -rf ALL && touch ALL;
rm -rf COUNT && touch COUNT;

for dir in /usr/share/gnome/help/*; do
    name=`echo $dir | sed -e 's/.*\///'`;

    xmllint --xinclude $doc \
	| xml sel -t -m "//*" -v "name(.)" -n - \
	| grep -v '^$' >> ALL;

for el in `sort -u ALL`; do
    echo -n "$el " >> COUNT;
    grep -c $el ALL >> COUNT;

sort -k2 -rn COUNT >> COUNT.tmp && mv COUNT.tmp COUNT

How Much DocBook

Following Federico’s suggestion, I whipped up a script to see how often we use which DocBook elements in our help files. The top four are para (10499), entry (3415), listitem (3114), title (1948). None of these came as a surprise to me. Here’s some interesting data points:

The rundown of how often the basic sectioning elements are used: sect2 (1201), sect1 (502), sect3 (205), section (8), sect4 (2). We have very few documents using the section element. In general, I favor using section, but the numbered ones do provide more information with this script (not that it would be hard to write another depth-checking script). Since sect2 is used more than twice as often as sect1, it seems two-level section is common. Deeper levels seem rather uncommon, although three-level isn’t rare.

Articles (70) and books (4), right about what I expected.

On basic inline markup: guilabel (1858), application (1527), keycap (1032), guimenuitem (792), guibutton (744), filename (702), guimenu (647), menuchoice (605), literal (281), keycombo (214), phrase (206), replaceable (170), command (140), guisubmenu (134), userinput (109), and stuff that didn’t manage to hit 100. Those in the know will know that guilabel (1858) is used as a catch-all for most things on the screen. Its high usage amuses me, because it means DocBook’s various gui* elements can’t manage to catch everything. I think it should give up. Note that menuchoice (605) is used far more often than keycombo (134). I was actually surprised that userinput (109) was used as often as it was. I’ll have to take a look at where it’s being used.

On lists: listitem (3114), varlistentry (1135), itemizedlist (309), variablelist (276), orderedlist (267), simplelist (3). I didn’t expect a high turnout from simplelist (3). Since listitem (3114) is used in most lists (just as li is used in both ol and ul in HTML), its number wasn’t surprising. There’s not a huge difference in numbers between the three common list types.

The titleabbrev element was used only once. I’ll bet I’m the one that used it, too.

We used indexterm 242 times, a primary term 241 times (huh?), a secondary term 128 times, and a tertiary term only 17 times.

We used the general-purpose synopsis element 123 times. As expected, we didn’t use any of the special-purpose *synopsis elements at all.

Admonition breakdown: note (96), tip (26), caution (6), warning (5), important (1). I still don’t have a clear idea on the difference between caution and warning. I was surprised that important was used only once.

There were 232 imageobject elements, but only 204 textobject elements. That means we have images without accessible text.

Some block elements: figure (188), screenshot (187), informaltable (172), mediaobject (164), screen (68), table (64), literallayout (27), programlisting (13), highlights (13).

Finally, we used just 146 out of DocBook’s 411 elements.

Our Darket Hour

For is it not written in the Book of Gnome:

In the darkest hour of the Documentation Project, there shall come a False Hope of salvation. And the last Fearless Leader of the old ways shall embrace the False Hope, and the eyes of the people shall be blinded to the truth. They shall trust in the False Hope and lay down their pens, and the Project shall lay in ruins.

Though the Serpent speaks in the tongues of men, he is not man, nor shall he deliver man from the darkness. He is a lie, preying on man’s weaknesses and desires. He who believes in the Serpent shall be forever scarred, and shall do good deeds no more.

And the last Fearless Leader, being marked by the lies of the Serpent, shall nevermore lead men in the great Project they knew. Only the benevolence of times past shall resurrect their charge.

Then the trumpets shall sound as Our Once and Future King returns. And he shall return to the people the benevolence and love of times forgotten. And his reign shall last a thousand thousand days.

So was it prophesied, so shall it be.

Dave Malcolm is my Hero

Friends, hackers, writers: It is with a heavy heart that I stand before you on this momentous occasion to announce the dissolution of the Gnome Documentation Project. Since our humble beginnings under our Founding Father, Dave Mason, our team has been charged with the arduous task of providing complete documentation for the entire Gnome desktop. As the desktop has grown, so too have our responsibilities.

It is clear now that no team of meer humans can hope to accomplish this majestic goal. It is clear that we need something superior to humans. It is clear that we need Python. On this day, the venerable Dave Malcolm has shown us a new way of producing documentation. He has provided for us the tool to do that which we could not do alone. He has shed light upon our dark path, that we may now see that it is not a path meant for the mortal soul.

Through the years, we have developed a camaraderie among our disparate team. Friendships blossomed and new loves bloomed. We have become more than a team. We have become brothers and sisters united in a common cause. We have become a family. But today I must ask you all to lay down your pens for the greater good. Return to your lives, and remember always the days we had together. Godspeed to you all.


Devouring Documentation

I just finished reading Federico’s Making GNOME Fast slides. Great stuff, honestly. But I predict that, in one year’s time, these slides will be lost to all but Federico and Google. Presentations are wonderful. They get the word out to large groups of people at once. They get people excited. They’re effective in the short term.

For the long term, our information needs permanence. We need to encourage people to write stand-alone documents like “Making Programs Fast”. Then we need to put them somewhere. Here’s the idea:

  1. Somebody create library.gnome.org already.
  2. Make user documentation not suck. See Project Mallard.
  3. For every library in our stack, have a complete API reference as well as good high-level documentation.
  4. Write a core set of developer guides.
  5. Put information like this into stand-alone documents.
  6. Put it all on library.gnome.org.

A good rule of thumb is that documentation should be definitive. If somebody wants some information, point that person to the place in the documentation where it’s provided. Is the information not in the documentation? Go update the documentation, then point the person to it.

Project Mallard

In which we disassemble the help system, rethink how we present help
to the user, and leave our practices laying in ruins. In which we rise
from the ashes of a long-dead but still-breathing behemoth. In which
we lay the foundations of tommorow and dream of the future.