I was asked for the script I used to count element usage in our DocBook files, as posted here yesterday. I’ve got to be honest. I wrote it in Mathematica. Why? Well, I thought through the problem in my head, and I thought to myself “Golly, the Split function would be handy here.”

But hey, it’s not all that hard with sh, given the right tools. So I wrote up the script again in something everybody could use. It’s a simple sh script, using the XMLStarlet utility. Don’t have XMLStarlet on your machine yet? Go get it. Get it now. XMLStarlet is a godsend for any *nix geeks doing XML stuff. Learn it, use it, love it.

Here’s the script in plain ol’ sh:

#!/bin/sh

rm -rf ALL && touch ALL;
rm -rf COUNT && touch COUNT;

for dir in /usr/share/gnome/help/*; do
    name=`echo $dir | sed -e 's/.*\///'`;
    doc=$dir"/C/"$name".xml";

    xmllint --xinclude $doc \
	| xml sel -t -m "//*" -v "name(.)" -n - \
	| grep -v '^$' >> ALL;
done;

for el in `sort -u ALL`; do
    echo -n "$el " >> COUNT;
    grep -c $el ALL >> COUNT;
done;

sort -k2 -rn COUNT >> COUNT.tmp && mv COUNT.tmp COUNT