Ducktype parser extensions

When designing Ducktype, I wanted people to be able to extend the syntax, but I wanted extensions to be declared and defined, so we don’t end up with something like the mess of Markdown flavors. So a Ducktype file can start with a @ducktype/ declaration that declares the version of the Ducktype syntax and any extensions in use. For example:

@ducktype/1.0 if/1.0

This declares that we’re using version 1.0 of the Ducktype syntax,that we want an extension called if, and that we want version 1.0 of that extension.

Up until last week, extensions were just theoretical. I’ve now added two extension points to the Ducktype parser, and I plan to add three or four more. Both of these are exercised in the _test extension, which is fairly well commented so you can learn from it.

Let’s look at the extensions we have, plus the ones I plan to add.

Block line parser

This extension is implemented. It allows extensions to handle really any sort of line in block context, adding any sort of new syntax. Extensions only get to access to lines after headings, comments, fences, and a few other things are handled. This is a limitation, but it’s one that makes writing extensions much easier.

Let’s look at an actual example that uses this extension: Mallard Conditionals. You can use Mallard Conditionals in Ducktype just fine without any syntax extension. Just declare the namespace and use the elements like any other block element:

@ducktype/1.0
@namespace if http://projectmallard.org/if/1.0/

= Conditional Example

[if:if test=target:html]
  This is a paragraph only shown in HTML.

But with the if/1.0 Ducktype syntax extension, we can skip the namespace declaration and use a shorthand for tests:

@ducktype/1.0 if/1.0

= Conditional Example

? target:html
  This is a paragraph only shown in HTML.

We even have special syntax for branching with if:choose elements:

@ducktype/1.0 if/1.0

= Conditional Branching Example

??
  ? platform:fedora
    This paragraph is only shown on Fedora.
  ? platform:ubuntu
    This paragraph is only shown on Ubuntu.
  ??
    This paragraph is shown on any other operating system.

(As of right now, you actually have to use if/experimental instead of if/1.0. But that extension is pretty solid, so I’ll change it to if/1.0 along with the 1.0 release of the parser.)

Directive handler

Ducktype files can have parser directives at the top. We’ve just seen the @namespace parser directive to declare a namespace. There is an implemented extension point for extensions to handle parser directives, but not yet a real-world extension that uses it.

Extensions only get to handle directives with a prefix matching the extension name. For example, the _test extension only gets to see directives that look like @_test:foo.

Block element handler

This extension is not yet implemented. I want extensions to be able to handle standard-looking block declarations with a prefix. For example, I want the _test extension to be able to do something with a block declaration that looks like this:

[_test:foo]

In principle, you could handle this with the current block line parser extension point, but you’d have to handle parsing the block declaration by yourself, and it might span multiple lines. That’s not ideal.

Importantly, I want both block line parsers and block element handlers to be able to register themselves to handle future lines, so they can have special syntax in following lines. Here is how an extension for CSV-formatted tables might look:

@ducktype/1.0 csv/1.0

= CSV Table Example

[csv:table frame=all rules=rows]
one, two, three
eins, zwei, drei
uno, dos, tres

Inline element handler

This extension is not yet implemented. Similar to block element handlers, I want extensions to be able to handle standard-looking inline markup. For example, I want the _test extension to be able to do something with inline markup that looks like this:

$_test:foo(here is the content)

For example, a gnome extension could make links to GitLab issue reports easier:

$gnome:issue(yelp#138)

Inline text parser

This extension is not yet implemented. I also want extensions to be able to handle arbitrary inline markup extensions, things that don’t even look like regular Ducktype markup. This is what you would need to create Markdown-like inline markup like *emphasis* and `monospace`.

This extension might have to come in two flavors: before standard parsing and after. And it may be tricky because you want each extension to get a crack at whatever text content was output by other extensions, except you probably also want extensions to be able to block further parsing in some cases.

All in all, I’m really happy with the Ducktype syntax and parser, and how easy it’s been to write extension points so far.

Ducktype: A Lightweight Syntax for Mallard

One of the projects I’ve been working on lately is Ducktype, a lightweight syntax for Mallard. Mallard has a lot of strengths. Its automatic linking mechanisms make content organization easier. Its focus on independent topics makes content re-use possible. Its revision information and other metadata allow you to do status tracking on large content pools. It has a well-defined extension mechanism that allows you to add new functionality and embed external vocabularies like TTML, SVG, and ITS.

XML is the backbone that makes all of this possible. But XML is also what slows adoption. There’s a growing trend towards using lightweight formats to make it easier to contribute. But while lightweight formats make easy things easy, they tend to fall over when dealing with the issues that XML-based vocabularies are designed to solve.

The idea for a lightweight syntax for Mallard has floated around for a couple years. I even spent some time trying to repurpose an existing lightweight format like reStructuredText or AsciiDoc, but none of them are able to carry the level of semantic information that Mallard needs.

Before going into details, let’s look at a Mallard page written in Ducktype:

= My First Topic
@link[guide >index]
@desc A short description of this page
@revision[date=2014-11-13 status=draft]

This is the first paragraph.
The paragraph continues here, but ends with the blank line below.

[steps]
* This is a steps list, common in Mallard.
* Without the [steps] declaration above, we'd get a normal bullet list.
  Indentation is significant, so this is still in the second item.
* Indentation is so significant that you can actually nest block elements.

  So this is a new paragraph still inside the third item.

  [note]
  And this is a paragraph in a note in the third item.

* You can also nest list items, or literally anything else.

  * This is a basic bullet list.
  * It is in the fourth item of the steps list.

This paragraph is outside the steps list.

One of the most distinguishing features is that, like Python, indentation matters. Indentation is how you stay inside a block element. Ducktype also allows you to do everything you’d do inside a Mallard <info> element, which is crucial to pretty much all of the compelling features of Mallard.

Ducktype is guided by a few design principles, just as Mallard was so many years ago:

  1. It should be possible to do almost anything Mallard XML can do. You can arbitrarily nest block elements. You can have inline markup everywhere you need it, including in code blocks and in other inline markup. You can embed extensions and other vocabularies so that things like Mallard Conditionals and Mallard+TTML are possible. In fact, the only limitation I’ve yet encountered is that you can’t put attributes on page and section titles. This means that Ducktype is capable of serving as a non-XML syntax for virtually any XML vocabulary.
  2. The most commonly used Mallard features should be easy to use. Mallard pages tend to be short with rich content and a fair amount of metadata. Steps lists are common. Semantic inline content is common. Linking mechanisms are, unsurprisingly, extremely common. Credits are common. Revision info is common. Licenses are nearly always done with XInclude.
  3. There should be a minimal number of syntactical constructs. Most lightweight formats have shorthand, special-purpose syntax for everything. This makes it extremely difficult to support extension content without modifying the parser. And for any non-trivial content, it makes it difficult to remember which non-alphanumeric characters you have to escape when.
  4. For extra special bonus points, it should be possible to extend the syntax for special purposes. Lightweight syntaxes are popular in code comments for API documentation, and in API documentation you want shorthand syntax for the things you reference most often. For an object-oriented language, that’s classes and methods. For XSLT, it’s templates and parameters. By not gobbling up all the special characters in the core syntax, we make it possible to add shorthand inline notations by just loading a plugin into the parser.

There’s some discussion on the mallard-list mailing list, starting in August. And there’s a preliminary Ducktype parser up on Gitorious. You can also get it from PyPI with `pip install duck`. If you’re interested in docs, or ducks, or anything of the sort, please join the conversation. I always like getting more input.

Creative Commons Attribution 3.0 United States
This work by Shaun McCance is licensed under a Creative Commons Attribution 3.0 United States.