I was asked for the script I used to count element usage in our DocBook files, as posted here yesterday. I’ve got to be honest. I wrote it in Mathematica. Why? Well, I thought through the problem in my head, and I thought to myself “Golly, the Split function would be handy here.”
But hey, it’s not all that hard with sh, given the right tools. So I wrote up the script again in something everybody could use. It’s a simple sh script, using the XMLStarlet utility. Don’t have XMLStarlet on your machine yet? Go get it. Get it now. XMLStarlet is a godsend for any *nix geeks doing XML stuff. Learn it, use it, love it.
Here’s the script in plain ol’ sh:
#!/bin/sh rm -rf ALL && touch ALL; rm -rf COUNT && touch COUNT; for dir in /usr/share/gnome/help/*; do name=`echo $dir | sed -e 's/.*\///'`; doc=$dir"/C/"$name".xml"; xmllint --xinclude $doc \ | xml sel -t -m "//*" -v "name(.)" -n - \ | grep -v '^$' >> ALL; done; for el in `sort -u ALL`; do echo -n "$el " >> COUNT; grep -c $el ALL >> COUNT; done; sort -k2 -rn COUNT >> COUNT.tmp && mv COUNT.tmp COUNT