Understanding XInclude

I’m often surprised when people don’t know about XInclude, but I suppose not everybody eats and breathes XML the way I do. XInclude is a way to include other files (or portions of other files) into a single XML file. We actually use them throughout GNOME documentation, though few people realize it. XInclude isn’t tied to any particular XML vocabulary like Mallard or DocBook. It’s an XML feature defined by the W3C, and you can use it in any XML file, as long as your processing tools support it.

If you’ve used SYSTEM entities in XML before, it’s important to understand a key difference. (If you haven’t, skip this paragraph.) SYSTEM entities are a pre-parse text slurp. The text of the included file are inserted, byte-for-byte, at the inclusion point, and the resultant run of characters is then parsed. With XInclude, the included file is parsed, and its infoset is merged into the inclusion point.

Basic XInclude

The simplest use of XInclude is to include the entirety of an external XML file. We use this in many of our Mallard and DocBook documents to include common legal information. In gnome-help, for example, we have a file called legal.xml that looks like this:

<license xmlns="http://projectmallard.org/1.0/">
  <p>Creative Commons Share Alike 3.0</p>

Then, in the info element of every page file, we use this:

<include href="legal.xml" xmlns="http://www.w3.org/2001/XInclude"/>

When the file is parsed, the entirety of the license element from legal.xml is inserted in place of the include element.

Text XInclude

By default, XInclude expects the included file to be well-formed XML. You can tell it to treat the file as text instead. This is useful if you want to show the text contents of a file, such as inside a Mallard code element.

Just add parse=”text” to the XInclude element, like so:

<include href="somefile.txt" parse="text"

I use this on the source pages of the tutorials on projectmallard.org. Look at the Ten Minute Tour Source page, for example. This shows the entire XML source of the Ten Minute Tour inside a Mallard code block.

The XML markup for the Ten Minute Tour Source page uses a text XInclude. The nice thing about this is that you don’t have to worry about escaping characters in the included file. So if you’re writing a lot of code examples with angle brackets, text XIncludes can be a convenient alternative to escaping or using CDATA blocks.

It’s important to note that an XInclude processor does not care what the file extension or reported MIME type of the included file is. The file is either parsed as XML or as text, and this depends solely on the parse attribute.

Parts of Documents

In the first example, we included a single standard element. You might wonder if you can include lots of boilerplate elements. If all the pages in a document share the same authors, you might want to put them all in one file and XInclude them in.

<credit><name>Shaun McCance</name></credit>
<credit><name>Phil Bull</name></credit>
<credit><name>Jim Campbell</name></credit>

If you put this into a file and try to XInclude it, you’ll get an error. Any file you XInclude (with the XML parse type) must be fully well-formed XML. Among other things, that means there must be a single root element. The example above has three root elements. So just wrap them with another element:

<info xmlns="http://projectmallard.org/1.0/">
  <credit><name>Shaun McCance</name></credit>
  <credit><name>Phil Bull</name></credit>
  <credit><name>Jim Campbell</name></credit>

Notice also that you do need the xmlns declaration to use namespaces. This will now parse, and XInclude works. But you’ll have an extra info element nested in the including document’s info element. That’s not right.

You can include only a portion of the included XML using XPointer. XPointer is a W3C syntax for pointing to pieces of documents. There are different schemes you can use with it to select data in different ways, but we’ll just stick to the xpointer() scheme, which uses XPath. The basic syntax looks like this:

<include href="credits.xml" xpointer="xpointer(/info/credit)"

This won’t work, however, because we’re using XML namespaces. You need to declare a namespace prefix and use it in your XPath. To do that, use the xmlns() XPointer scheme:

<include href="credits.xml"

This does exactly what we need. We wrap the credit elements with an info element to make the included file well-formed. (It also gives us a convenient single place to declare namespaces.) Then we select only the credit elements to XInclude with an XPath expression. I won’t go into all the details of what XPath can do, but for simple cases like this, it looks basically like a directory path.

This does require you to keep and distribute an extra file. Instead of doing that, you could keep the information in one of your page files, then XInclude portions of that file in every other page file. Since every Mallard document has an index page, you could do this for index.page:

<page  xmlns="http://projectmallard.org/1.0/">
  <credit><name>Shaun McCance</name></credit>
  <credit><name>Phil Bull</name></credit>
  <credit><name>Jim Campbell</name></credit>
<title>My Index Page</title>

Then in every page except index.page, use this in the info element:

<include href="index.page"

You might wonder if you can use XPointer to include only a portion of a file included with parse=”text”, such as a certain range of lines. XPointer allows extension schemes to be defined. In fact, there are a couple dozen schemes registered with the W3C. One of them allows you to select a string range from an XML file, although there is no registered scheme to select a range from a text file.

You don’t need a registered scheme, though. All you need is for your XML processor to understand the scheme you’re using. Unfortunately, libxml2 only supports the xpointer(), xmlns(), and element() schemes at this time. But if you really need this kind of functionality, you can probably hire an expert to implement it for you.

Creative Commons Attribution 3.0 United States
This work by Shaun McCance is licensed under a Creative Commons Attribution 3.0 United States.