2019-05-06 – The Gist

I spent a lot of time making Ducktype into a lightweight syntax that I would really enjoy using. I had a list of design goals, and I feel like I hit them pretty well. In this post, I want to outline some of the things I hope you’ll love about the syntax.

1. Ducktype has a spec

I know not everybody nerds out over reading a spec the way I do. But whether or not you like reading them, specs are important for setting expectations and ensuring interoperable implementations. Probably my biggest gripe with Markdown is that there are just so many different flavors. Couple that with the multiple wiki flavors I regularly deal with, and my days become a game of guess-and-check.

Even in languages that haven’t seen flavor proliferation, you can still run into issues with new versions of the language. In a sense, every version of a parser that adds features creates a new flavor. Without a way to specify the version or do any sort of introspection or conditionals, you can run into situations where what you type is silently formatted very differently on different systems.

In Ducktype, you can declare the language version and any extensions (more on those later) right at the top of the file.

@ducktype/1.0
= Header

This is a paragraph.

2. Ducktype keeps semantics

I’ve been a strong proponent of semantic markup for a long time, since I started working with LaTeX over 20 years ago. Presentational markup has short-term gains in ease-of-use, but the long-term benefits of semantic markup are clear. Not only can you more easily adapt presentation over time, but you can do a lot with semantic markup other than display it. Semantic markup is real data.

Different lightweight languages support semantic markup to different degrees. Markdown is entirely presentational, and if you want to do anything even remotely sophisticated, you have to break out to HTML+CSS. AsciiDoc and reStructuredText both give you some semantic markup at the block level, and give you the tools to create inline semantic markup, but out of the box they still encourage doing semantic-free bold, italic, and monospace.

My goal was to completely capture the semantics of Mallard with less markup, not to just create yet another syntax. Ducktype does that. What I found along the way is that it can fairly nicely capture the semantics of other formats like DocBook too, but that’s a story for another day.

Ducktype uses a square bracket syntax to introduce semantic block elements, inspired by the group syntax in key files. So if you want a note, you type:

[note]
This text is an implicit paragraph in the note.

If you want a plain old bullet list, you type it just like in any other lightweight syntax:

* First item
* Second item
* Third item

But if you want a semantic steps list, you add the block declaration:

[steps]
* First step
* Second step
* Third step

Ducktype keeps the semantics in inline markup too. Rather than try to introduce special characters for each semantic inline element, Ducktype just lets you use the element name, but with a syntax that’s much less verbose than XML. For example:

Click $gui(File).
Call the $code(frobnicate) function.
Name the file $file(index.duck).

Both AsciiDoc and reStructuredText let you do something similar, although sometimes you have to cook up the elements yourself. With Ducktype, you just automatically get everything Mallard can do. You can even include attributes:

Press the $key[xref=superkey](Super) key.
Call the $code[style=function](frobnicate) function.

3. Ducktype keeps metadata

In designing Ducktype, it was absolutely critical that we have a consistent way to provide all metadata. Not only is metadata important for the maintenance of large document sets, but linking metadata is how documents get structure in Mallard. Without metadata, you can’t really write more than a single page.

You can do very good page-level metadata in both AsciiDoc and reStructuredText, but for Mallard we need to do metadata at the section and even block levels as well. Markdown, once again, is the poor format here with no support for any metadata at all. Some tools support a YAML header in Markdown, which would be pretty great if it were any sort of standard.

Ducktype uses a single, consistent syntax for metadata, always referencing the element name, and coming after the page header, section header, or block declaration that it’s providing info for.

= My First Topic
@link[type=guide xref=index]
@desc This is the first topic page I ever wrote.

== My First Section
@keywords first section, initial section

Do you like my first topic page with a section?

We can even do this with block elements:

[listing]
  @credit
    @name Shaun McCance
  . A code listing by Shaun
  [code]
    here_is_some_code()

There’s a lot going on in that little example, from nesting to shorthand block titles. The important part is that we can provide metadata for the code listing.

(Side note: I thought a lot about a shorthand for credits, and I have ideas on an extension to provide one. But in the end, I decided that the core syntax should have less magic. Explicit is better than implicit.)

4. Ducktype allows nesting

I do quite a bit of semi-manual format conversion fairly frequently. It’s not something I enjoy, and it’s not the best use of my time, but it just keep having to be done, and I’m kind of good at it. If you do that a lot, you’ll often run into cases where something just can’t be represented in another format, and that usually comes down to nesting. Can you put lists inside table cells? Can you put tables inside list items? Can lists nest? Can list items have multiple paragraphs?

Both reStructuredText and (shockingly) most flavors of Markdown allow some amount of nesting using indentation. This sometimes breaks down when tables or code blocks are involved, but it’s not terrible. This is the one place where AsciiDoc is my least favorite. In fact, this is my single least favorite thing about AsciiDoc. It uses a plus on its own line to indicate a continuation. It’s hard to visualize, and it has severely limited functionality.

Ducktype very explicitly allows nesting with indentation, whether you’re nesting lists, tables, code blocks, notes, or anything else. It allows you to skip some indentation in some common cases, but you can always add indentation to stay in a block. Ducktype specifies the exact nesting behavior.

[note style=important]
  This is very important note about these three things:

  * The first thing is hard to explain.

    It takes two paragraphs.

  * The second thing can really be broken in two:

    * First subthing
    * Second subthing

  * The third thing involves some code:

    [code]
      here_is_some_code()

You can use any number of spaces you like. I prefer two, in part because it fits in with the shorthand syntax for things like lists. Using indentation and block declarations, there is absolutely no limit on what or how deeply you can nest.

5. Ducktype tables are sane

I have never met a lightweight table syntax I liked. Trying to represent real tables with some sort of pseudo-ascii-art works ok for very simple tables, but it falls apart real fast with anything remotely non-trivial. Add to this the extra syntax required for for all the non-trivial stuff, and a substantial portion of your parser is devoted to tables. As a user, I have a hard time keeping all those workarounds in my head.

As I said above, the ability to nest things was very important, so most of these tables syntaxes were just a non-starter. How do you add extra blocks in a table cell when the whole table row is expected to fit on one line? In the end, I decided not to have a table syntax. Or more accurately, to treat tables as a sort of list.

Without any special treatment, tables, rows, and columns can already be written in Ducktype using the block declarations you’ve already seen:

[table]
  [tr]
    [td]
      One
    [td]
      Two
  [tr]
    [td]
      Three
    [td]
      Four

There’s no magic happening here. This is just standard block nesting. But it’s pretty verbose, not as verbose as XML, but still. Can’t we make it a bit easier, like we do for lists? Well yes, we make it easier exactly like we do for lists.

[table]
[tr]
* One
* Two
[tr]
* Three
* Four

There are a few things going on here. Most importantly, the asterisk is shorthand for a table cell, not a list item. What that shorthand does depends on context. But also, we’ve removed quite a lot of indentation. Earlier I wrote that there are some special rules that allow you to trim indentation in certain common cases. This is one of them.

Using a vertical table syntax means that you can do the same kind of nesting you can do with other elements. Here’s a list in a table cell:

[table]
[tr]
* Here's a list:
  * First
  * Second
* Another table cell

Ducktype isn’t the first format to allow a vertical table syntax. MediaWiki allows tables to be written vertically, and it works really well. But Ducktype is the only one to my knowledge not to introduce any new syntactical constructs at all. I have a hard time keeping all those dots and dashes in my head. Using a single, consistent syntax makes it easier to remember, and it reduces the number of special rules you need.

6. Ducktype supports extensions

I’ve mentioned extensions a couple of times already in this post, but I haven’t elaborated on them. In addition to being able to use Mallard extensions (which you can do without any extra syntax), I wanted a way to add and experiment with syntax without shoving everything in the core. And, importantly, I wanted pages to have to declare what extensions they use, so we don’t have a guess-and-check mess of incompatible flavors. I live in a world where files are frequently cherry-picked and pushed through different tool chains. Being explicit is important to me.

So in Ducktype, you declare your extensions at the top, just like you do with the version of Ducktype you’re using:

@ducktype/1.0 if/experimental
= Page with Experimental Extension

What kinds of things can an extension do? Well, according to the spec, extensions can do literally anything. Realistically, what extensions can do depends on the extension points in the implementation. The reference implementation has two rough classes of extensions. There are the methods in the ParserExtension class, which let you plug in at the line level to affect how things are parsed. And there’s the NodeFactory, which lets you affect how parsed things are interpreted.

All of the ParserExtension methods are implemented and pretty well commented in the _test extension. One interesting use of this is the experimental csv extension. Remember how tables are always written vertically with a list-like syntax. That’s very powerful, but sometimes you really do just want something super simple. The csv extension lets you do simple comma-separated tables, like this:

@ducktype/1.0 csv/experimental
= CSV Table Example
[csv:table]
one,two,three
four,five,six
seven,eight,nine

That’s in about 40 lines of extension code. Another example of line parser extensions is a special syntax for conditionals, which I blogged about before.

The NodeFactory extension point, on the other hand, lets you control what the various bits of special syntax do. If you want to want to use Ducktype to write DocBook, you would use DocBook element names in explicit block declarations and inline notation, but what about the elements that get implicitly created for paragraphs, headers, list items, etc? That’s what NodeFactory does.

Et cetera

That’s six things to love about Ducktype. I’ve been really happy with Mallard’s unique approach to structuring docs. I hope Ducktype can make the power of Mallard available without all the extra typing that comes along with XML. If you want to learn more, get in touch.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: May 6, 2019

Things to Love About Ducktype