On command-line argument parsing

The command-line tools that are part of GnuTLS (such as certtool and p11tool) had been using the GNU AutoGen for handling command-line arguments. AutoGen (do not be confused with autogen.sh script commonly used in Autotools based projects) does a great job in that regard, as it produces command-line parsing code and the documentation from the single source file. On the other hand, integrating the AutoGen infrastructure into a project can be tricky in many ways, e.g., it requires its own runtime library (libopts) whose interface compatibility is not well maintained. Therefore, we decided to switch to a simpler solution and have finally completed the migration recently. As I spent way too much time on this, I thought it might make sense to summarize the process in case anyone comes into a similar situation.

The first thing we tried was to define the requirements and review the existing alternatives. The requirements turned out to be:

The tool produces code and documentation from the same source, i.e., we do not need to repeat ourselves writing a separate documentation for the commands
The generated code has little to no run-time dependencies
The tool itself doesn’t have exotic (build-)dependencies

We soon realized that there are surprisingly few candidates that meet those requirements. help2man, which is widely used in GNU tools, generates documentation from the command output, while it only supports manual pages (no texinfo/html/pdf support); neither GNU Gengetopt, gaa, nor argtable supports documentation generation at all, etc.

The other thing to consider was how to implement it in a non-disruptive manner. The initial attempt was to combine a help2man-like approach with documentation format conversion using Pandoc, which seemed good in general but the hurdle was that the AutoGen option definitions are written in its own language. Before proceeding with this approach we need to find a way to convert the definitions into the actual option parsing code!

We split this task into two phases: first to parse the AutoGen definitions and convert it to an easier-to-use format such as JSON and YAML, and then process it to generate the code. For the former, I came across pest.rs, which is a PEG (parsing expression grammar) based parser generator with elegantly designed programming interface in Rust. With this I was able to write a converter from the AutoGen definitions to JSON.
Then the generated JSON files are processed by Python scripts to generate the code and documentation. As the first phase is one-shot, we do not need Rust at build time but only need the Python scripts and its dependencies to be integrated in the project.

The scripts and the JSON schema are now hosted as a separate project, which might be useful for other projects.

On command-line argument parsing

Leave a comment

Cancel reply