2011-02-02 – The Gist

The nice thing about Blip is that it’s completely open and insanely extensible. It tries very hard to find things like documents and translations wherever it can, but sometimes your content just doesn’t look like what it knows about.

But that’s OK. You can easily add plugins to find and display more information. In fact, everything you see in Blip—every bit of data collected and every page generated—comes from a plugin. Some of them are in the core blip package. Others are provided by the blip-gnome package, which has plugins to scan modules that look like typical Gnome modules.

If you want to deploy Blip for your project, you can create a package full of plugins to deal with the kind of data you actually care about. To show how easy this is, I’ll walk through a plugin I just added for the Evolution Quick Reference Card. This is a PDF document generated from a TeX file that lists common keyboard shortcuts. Obviously, this won’t be picked up by the plugins that handle DocBook, Mallard, and gtk-doc documents.

First, look at the report on the Evolution Quick Reference Card on Blip. Notice that Blip knows the title of the document, when it was modified, and what translations exist.

All of this comes from the evolutionquickref Blip plugin. This is less than 100 actual lines of code. Let’s break it down.

class QuickRefScanner (blip.plugins.modules.sweep.ModuleFileScanner):

In Blip, you create plugins by subclassing known extension points. In this case, there’s a module scanner (which is itself just a plugin). The module scanner reads the SCM history for a module, then walks the tree of files calling each file scanner.

A file scanner contains two methods. The process_file method is called for every file in the module. It’s expected to decide for itself whether it’s interested in the file. After all the files have been walked, the post_process method is called on each file scanner. Sometimes you need to find everything before you can fully process stuff.

    def process_file (self, dirname, basename):
        branch = self.scanner.branch
        is_quickref = False
        if branch.scm_server == 'git://git.gnome.org/' and branch.scm_module == 'evolution':
            if basename == 'Makefile.am':
                if os.path.join (self.scanner.repository.directory, 'help/quickref') == dirname:
                    is_quickref = True
        if not is_quickref:
            return

Here we’re just determining whether we care about the file. Unlike most scanners, we hardcode the SCM server and module to restrict it to Evolution. We then only look at help/quickref/Makefile.am. Why that file? Because we’re going to parse it for language information.

        with blip.db.Timestamp.stamped (filename, self.scanner.repository) as stamp:
            try:
                stamp.check (self.scanner.request.get_tool_option ('timestamps'))
            except:
                data = {'parent' : branch}
                scm_dir, scm_file = os.path.split (rel_ch)
                data['scm_dir'] = os.path.join (scm_dir, 'C')
                doc = blip.db.Branch.select_one (type=u'Document', **data)
                if doc is not None:
                    self.scanner.add_child (doc)
                    self.document = doc
                raise

Blip aggressively timestamps to avoid needless reprocessing. Trust me, in a production environment, you do not want it reprocessing every file from 100+ modules all the time.

Blip provides a convenient timestamp-checking class you can use with Python’s awesome with statement. When you call stamp.check, it checks the mtime of the file against what’s recorded in the database. If the file is newer, or if the database has no timestamp, it just returns and lets your code execute. Otherwise, it throws an exception that’s caught by the with block. If the entire block finishes without exception, a timestamp is added to the database automatically.

(By the way, Blip uses introspection to automatically attach a source function to each timestamp. So this timestamp won’t interfere with timestamps on the same file from another plugin.)

Sometimes you still want to do things when you’re skipping a timestamped file. In this case, we try to find a matching document in the database and add it to the module scanner. Why do we do this? The module scanner deletes child objects (documents, applications, translation domains, etc) from the database unless it’s told about them on each processing run. This is how we prevent stale information from lingering around in Blip. But that means we need to let the module scanner know about the document, even if we’re not going to process it.

            makefile = blip.parsers.get_parsed_file (blip.parsers.automake.Automake,
                                                     self.scanner.branch, filename)

Blip has some built-in parsers for the kinds of files it usually encounters. This one tries to extract some basic information from Makefile.am files. It can’t do everything; after all, an automake file is a template for a shell script. But it does a good job with most data.

            ident = u'/'.join(['/doc', bserver, bmodule, 'quickref', bbranch])
            document = blip.db.Branch.get_or_create (ident, u'Document')
            document.parent = branch
            for key in ('scm_type', 'scm_server', 'scm_module', 'scm_branch', 'scm_path'):
                setattr (document, key, getattr (branch, key))
            document.subtype = u'evolutionquickref'
            document.scm_dir = blip.utils.relative_path (os.path.join (dirname, u'C'),
                                                         self.scanner.repository.directory)
            document.scm_file = u'quickref.tex'
            self.scanner.add_child (document)
            self.document = document

We first create an ident. This is how just about everything is identified in Blip. In many cases, idents match the path of the URL you use to access objects. We use some information from the ident of the parent module. We then get or create an object in the database. If it’s not there already, Blip just makes it for us, and handles some extra administrative work.

After setting some information, we add the document to the module scanner. This is the same thing we did in the except clause above. We have to add it, or it will be deleted.

            translations = []
            for lang in makefile['SUBDIRS'].split():
                if lang == 'C':
                    continue
                lident = u'/l10n/' + lang + document.ident
                translation = blip.db.Branch.get_or_create (lident, u'Translation')
                translations.append (translation)
                for key in ('scm_type', 'scm_server', 'scm_module', 'scm_branch', 'scm_path'):
                    setattr (translation, key, getattr (document, key))
                translation.scm_dir = blip.utils.relative_path (os.path.join (dirname, lang),
                                                                self.scanner.repository.directory)
                translation.scm_file = u'quickref.tex'
                translation.parent = document
            document.set_children (u'Translation', translations)

We now find translations by looking at built subdirectories (except C, because that’s the source). We go through the same business of creating idents and getting or creating database objects.

Then we call set_children on the document. Remember that business of having to add children to the module scanner to keep them from being deleted? This is exactly what the module scanner calls on the module, using the objects that get added to it. This one line prevents us from having stale translations in the database if they’re removed from the module.

We’ll handle everything else in post_process.

        regexp = re.compile ('\\s*\\\\textbf{\\\\Huge{(.*)}}')
        filename = os.path.join (self.scanner.repository.directory,
                                 self.document.scm_dir, self.document.scm_file)
        with blip.db.Timestamp.stamped (filename, self.scanner.repository) as stamp:
            stamp.check (self.scanner.request.get_tool_option ('timestamps'))
            stamp.log ()
            for line in open(filename):
                match = regexp.match (line)
                if match:
                    self.document.name = blip.utils.utf8dec (match.group(1))
                    break

Here we process the quickref.tex file. (That’s what we set scm_file to above.) Again, we do this behind a timestamp so we don’t read a file if it hasn’t changed. We just loop over the lines in the file looking for one that matches our regexp, and use that to extract the title of the reference card. This is a one-off plugin, so we could have just hardcoded the title. But what’s the fun in that?

        rev = blip.db.Revision.get_last_revision (branch=self.document.parent,
                                                  files=[os.path.join (self.document.scm_dir,
                                                                       self.document.scm_file)])
        if rev is not None:
            self.document.mod_datetime = rev.datetime
            self.document.mod_person = rev.person
        self.document.updated = datetime.datetime.utcnow ()

Now we look up the most recent git revision to have touched quickref.tex and record that information. “But Shaun,” you say, “that information is already in the database. Normalize!” I’m not dogmatic about normalization. I put a lot of effort into fast page load times (and there’s still work to be done on some pages). This helps a lot.

We also set the updated field. This just says when Blip last looked at this particular thing. After that (code not reproduced here), we loop over the translations and do the same thing for them.

That’s it. A simple but non-trivial Blip plugin in about 100 lines of code. If you’re feeling adventurous, you could try to compute completion statistics for the translations. They’re not using PO files, but the TeX files are predictable and parseable, so you could put some sort of statistics together.

What kind of data do you have in your modules? As long as it’s parseable, it’s as simple is this to tell Blip about it. Try it yourself. And get in touch; I’m happy to help people build on Blip.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28

Day: February 2, 2011

A Quick Primer on Blip Plugins