Introducing cldoc: a clang based documentation generator for C and C++

I would like to introduce my latest project that I have spent some time on developing in the last weeks. cldoc is a clang based documentation generator for C and C++. I started this project because I was not satisfied with the current state of documentation generators available for C++, and I thought it would be a fun little project (turns out it was fun, but not that little). What I was looking for is a generator which does not require any configuration or me writing a whole lot of special directives and instructions in my code. It should also be robust against complex C++ projects and not require me to tell it by hand how to interpret certain parts if its parser is too limited (hence using clang). Finally, I wanted a modern output, nice coverage reporting, integrated searching, simple deployment and integration.

I think cldoc addresses most (if not all) of these features in its current stage. Even if it’s still in the early stages of development, it’s pretty complete and seems to work well medium sized projects. I’m sure it’s not without bugs, but those can be fixed!

Features:

  • Uses clang to robustly parse even the most complex C++ projects without additional effort from the user.
  • Requires zero configuration.
  • Uses markdown for documentation formatting.
  • Generates an xml description of the API which can be reused for other purposes.
  • Uses a simple format for documenting your code.
  • Supports cross-referencing in documentation.
  • Generates a single file, javascript based web application to render the documentation.
  • Integrates seamlessly with your existing website.
  • Lightning fast client-side searching using a pregenerated search index.
  • Generates a formatted documentation coverage report and integrates it in the website.

I also wanted to detail here some of the development, since I always like to take the opportunity to use “modern” technology when starting a new project. The cldoc utility itself is a pretty plain and simple python application. Nothing particularly interesting about that. It uses the libclang python bindings to parse, scan and extract all necessary information. It then generates a set of xml pages describing fully the namespaces, types, methods, functions etc that were scanned. It also does some fancy cross-referencing and generates a suffix-array based search index in a json file. The search index is fetched on the client and traversed locally for very fast searching of documentation.

I had some more fun with the web application though. What I wanted was something with the following features:

  1. Easy to deploy
  2. Easy to integrate in an existing website
  3. Doing things on the client as much as possible
  4. Render the website from the generated xml
  5. Still allow exporting a static website instead of a dynamic one

As a result, the generated html file is really just a stub containing two containers. All the rest is done in javascript. For this project I went with coffeescript. I had played with it before, and  it certainly has it quirks. The fact is though it does make me a lot more productive than writing javascript directly. Generating html is still a pain, but it’s shorter to write. Besides that I obviously use jQuery (can’t do anything without really), and showdown (a javascript markdown parser).

What the webapp basically does is fetch the xml page using AJAX and format/render it on the page. It then uses html5 history to implement navigating the documentation without actually going to another page. Navigation around simply fetches more xml pages and renders them. It caches the fetched xml and rendered html so things end up being pretty fast. One other neat thing is the implementation of local search. When a search is initiated, the search database is fetched from the server. This database is basicly a suffix array encoded in json. Currently it’s quite basic, it only indexes symbols. Then, a html5 webworker is started which does the actual searching, so that the browser is not blocked during this process. For medium sized projects, I don’t think there is any real need for this, but at least now it will scale pretty well. Search results are then communicated back to the main javascript thread and rendered.

One nice side effect of this implementation is that it becomes very easy to integrate the documentation with an existing website in an unobtrusive way. An existing website can simply add two containers to their html and load the js app. The rest of the website will still function normally. The only real issue currently would be that if the original website also uses html5 history, they will currently conflict. This is something to resolve in the near future. Another “issue” is that it’s very hard to separate css styles completely, especially if not designed as such from the start. The cldoc stylesheet will only apply styles to the cldoc containers, but of course, the containing website will still have styles which can cascade into the cldoc containers. Anyway, not my problem I guess :)

With regard to the last bullet point of my little list above, the whole website can also be generated statically quite easily. This is not implemented at this time, but the same javascript web app can be run on node.js without too much effort. Each page can then be rendered using the exact same app (using jsDOM) and then exported as a html file.

Well, that’s it for now. Please have a look at it and let me know what you think. The project is completely hosted on github. Website: http://jessevdk.github.com/cldoc, source: https://github.com/jessevdk/cldoc.

This entry was posted in Uncategorized. Bookmark the permalink.

20 Responses to Introducing cldoc: a clang based documentation generator for C and C++

  1. Matthias says:

    The example page does not load for me except for two empty frames. But apart from that, the idea sounds compelling.

  2. Spidey says:

    I’ll try this tomorrow at work. We have a big product in C, aproximately 800 KLOC.

    How does it work with guards, defines and preprocessor stuff?

    • Jesse van den Kieboom says:

      Basicly, I’ll see symbols as they are after the preprocessing stage from clang. Unfortunately clang doesnt right now expose defines and macros, so you can’t document them at the moment. You’ll need to run cldoc with the same parameters as you use for compilation. I’ll be very interested in hearing your feedback on a large project (I’ve only used it on a 8KLOC project myself until now).

  3. Alex Reinhart says:

    What would be neat is to hook the XML data up to Sphinx, which generates beautiful documentation sites:

    http://sphinx-doc.org/

    Sphinx is commonly used for Python documentation because it has all sorts of handy features for syntax highlighting, searching, and so on. It has an extension to automatically document Python code by using the docstrings, but presumably you could use cldoc to provide information to Sphinx.

    • Jesse van den Kieboom says:

      Yes, I’ve looked into sphinx. It would be interesting to see if we can easily transform the XML to something that sphinx will grok.

  4. Nathan says:

    Why does it seem needlessly complex? Why XML? Why a builtin js app? Why not just spit out markdown docco style and let the user choose how to export it (HTML, etc).

    • Jesse van den Kieboom says:

      It uses XML so you can reuse that information for other purposes (think of generating bindings). The js app is useful because this way it can be effortlessly embedded in an existing website. Doing it this way also still allows you to export a static html website (I explained this in the post). If you want to write a generator which generates just markdown, then be my guest (look at cldoc/generators/xml.py)!

  5. Tristan says:

    This looks awesome :-). DOxygen is the de facto standard when it comes to documenting C++, but it can be a little “warty” sometimes (as can gtk-doc), for example around #IFDEFs. A doc generator that uses a real compiler parser (and the same compiler flags) seems like a fantastic idea!

    I like the idea of using markdown, but given that DOxygen is what almost everyone is using at the moment, have you given any thought to a compatibility mode?

    Also, I’m not sure about the idea that there is no special marker for a cldoc comment (e.g. /// or /**). Sometimes you want to add comments that you certainly don’t want appearing in the online documentation!

    Lastly, it would be great if you could expand the bicycle example a bit to show off more of what the documentation could look like for a “large” C++ project: constructors, reference arguments, const and pure virtual methods, overloaded operators, template parameters, STL containers etc etc… unfortunately C++ is huge!

    Keep up the good work though, I think this could be a really fantastic project :-)

    • Jesse van den Kieboom says:

      I thought about a compatibility mode, but one of the motivations is to make the whole thing simpler. Doxygens parser for comments is huge and complex. I could support only a subset obviously. For the comments I’ll have to see. For me it’s working out great with normal comments. For me it seems that if a private comment that doesn’t show up in the documentation is the exceptional case and would need some special syntax.

      I have a more complete working example with all that you mention, but that project has not been published yet. I threw together the small bicycle example as a starter. I’ll be expanding it as it goes.

  6. Andreas Pokorny says:

    How does it compare to doxygen?
    We had some issues with doxygen since we use some preporocessor magic to create functor classes (supporting old compilers).

    Can you control which macros are evaluated and expanded and which are not?

    • Jesse van den Kieboom says:

      cldoc is just going to get symbols after the preprocessing stage. So you can define -D stuff when invoking cldoc, and it will see the symbols that are visible. Support for actually documenting macros is not great though (clang doesn’t currently expose information about macros). For example, when you define a macro which defines some method, cldoc will see the symbol but you can’t document it in-code right now (you can still use an external file to document it though).

  7. Stefan Sauer says:

    This is neat. Whenever I was hacking on gtk-doc I was thinking of using 1) a clang based parser, 2) render directly to html. The source->xml->html is suboptimal. xslt rendering (using libxslt) is single threaded and non-incremenal. That is it scales badly.
    For the markup I was considering asciidoc.

  8. Germán Diago says:

    Hello jessevdk,

    Very nice project. I think it has a lot of potential. Things I really like for now:

    1. The documentation layout looks pretty clear (much nicer than doxygen’s default).
    2. Basing the parser in clang means that it understands c++ correctly, and will always do.
    3. Uses markdown, very good choice, indeed.
    4. Documentation coverage report is a very good idea :)

    I will leave here my my way to organize the documentation in doxygen in my project and what I had to customize with the hope that you tell me what can and cannot be done and get feedback on how I’m using doxygen and which would be my needs.

    1. I use TDD for the project, I have 2 groups for tests. Unit tests and functional tests, in their own pages.
    2. I customize doxygen parser BOOST_AUTO_TEST_CASE macro for the parser to fake a normal function output when outputting the documentation for the tests.
    3. I have groups of member functions, not with the default name “Methods”. I have groups like “Getters”, “Setters” inside class documentation.
    4. Related functions that are not members: I make them appear in the documentation for the class, because this is where they belong.

    With cldoc:

    1. Can I have custom commands?
    2. Can I document that a function throws?
    3. Can I group non-member functions and make the output in the same documentation page as the class itself?
    4. Can I group methods inside a class with a different name that is not “Methods”? It looks too generic when a class has already a few methods and it’s better to subgroup.

    Thanks for this good tool. Keep up the work, it looks promising :)

    • Jesse van den Kieboom says:

      Thanks for the feedback, it’s much appreciated. First, I would like to say that the whole purpose of cldoc is to try to come back to a documentation system which is simple, and uniform over different projects. This means that instead of allowing a lot of knobs and tweaks, I would like to create a consistent and well formatted output. This would then make it much easier to read documentation of different projects since they are all consistently formatted. This is also the reason that at this moment there aren’t a lot of directives that allow you to reorganize the documentation. That said, I of course recognize that complex projects have complex needs. Adding more knobs will be an iterative process.

      So that said, lets look at your questions and how cldoc tries to answer them.

      1. I’m not really sure what a custom command is. Obviously, you can take cldoc and change the code, but right now you can’t extend it or write plugins in any nice way. I’m also not planning support for this.

      2. You can write any documentation you want, but there is no special directive for throwing exceptions (if that’s what you’re asking). Right now, throw annotations on functions are not exposed in the documentation, but it’s something that I would like to add.

      3. There isn’t any directive for this. The only thing cldoc does that is a bit related to this is that it will group plain C functions which receive a struct pointer as their first argument under that struct.

      4. The only way to group things currently is by means of moving symbols in categories. This is not well suited for what you want. Some reasons against having custom groups:

      4.1 Makes it more difficult to locate a method for a user. Maybe you think a group is something logical, but for the user it might not be. Just sorting methods on alphabet will make it easier to find it.
      4.2 Decreases cross project documentation consistency.
      4.3 Adds more specialized syntax/directives.

      That said, I of course see the advantages also (although grouping in getters/setters doesn’t seem like the best example). Sometimes there are a lot of methods and they can be cleanly categorized based on certain tasks.

      In short, cldoc doesn’t yet have the features that you ask for. I want cldoc to stay as simple as possible, but not simpler :) Please file issues on github for your feature requests so they can be tracked and discussed in more detail.

  9. As Germán Diago said, support for Doxygen style comments would be great for existing projects.

    At work, we need to generate printed/printable documents describing the code. This is not an easy thing to do with Doxygen as the default styles (with tex output) is quite bad and I haven’t figured out an easy way to change it. I even tried to meddle with the XML output to generate an ODF document out of it.

    Does cldoc support this use-case? While browsable HTML is great, a printable PDF document is ultimately what we’re looking for.

  10. Germán Diago says:

    “Syam Krishnan Says:

    As Germán Diago said, support for Doxygen style comments would be great for existing projects.”

    I think you misunderstood me. Where did I say so? This would make the tool fairly complicated. Maybe from one of my comments you took that, but I didn’t mean that at any time.

  11. Germán Diago says:

    Hello again Jesse,

    I don’t know how to quote comments in this blog, if you know how, let me know, please. I copy/paste and reply below.

    Overall I agree with the goals of consistency in documentation among projects, since later you know where to expect to find things. When I make my comments, keep in mind I’m always talking about C++.

    > 1. I’m not really sure what a custom command is. Obviously,
    > you can take cldoc and change the code, but right now you
    > can’t extend it or write plugins in any nice way. I’m also not
    > planning support for this.
    A custom command is an alias through which you can support a set of commands. You can take a look here for getting an idea of how it works: http://www.stack.nl/~dimitri/doxygen/manual/custcmd.html
    I don’t consider this feature a must have, and after thinking about consistency and simplicity across projects as you said, maybe it’s even a bad idea.

    > 3. There isn’t any directive for this. The only thing cldoc does
    > that is a bit related to this is that it will group plain C
    > functions which receive a struct pointer as their first
    > argument under that struct.
    My opinion is that this is a must in some way or other. Imagine you have a package with some classes. When you take a look at a class, you want to know about the related free functions. This is ok if you have some functions in the same package, since you can cross-reference easily the class to the related functions and viceversa. I mean, you can add a link in the class page.
    The problem I see with this solution is if I later develop another library with more extensions to these classes. Then there is no way to add to the class section the new functions without regenerating the documentation for the original package. So I would propose this infinitely scalable solution :) :

    a. A function is related to one class when the class is named in one of its arguments. So a free function can belong to several classes.
    b. A section with related functions classified by class is added to the documentation. Something like “free functions related to class X” (look for a better name if you want, I didn’t give much thought to this). Maybe it could be split in overloaded operators and other functions.
    d. When the free functions are in the same package (generated at the same time that documentation for class X), then, reference class X from page “functions related to class X”, but not viceversa, since external extension libraries won’t be able to do it. Otherwise, just name the class but don’t reference it with a link, since it’s not possible without configuration on where to find the documentation, and one of the features is zero configuration.

    Properties of the solution:
    – consistent among projects, since related functions are defined by cldoc, not by the user.
    – You can browse related functions to one class easily, and you will always find them in the same section.
    – Still requires zero configuration :)
    – I can add as many packages as I want and the documentation will be consistently found always in the same places for the new packages.

    The disadvantage is that you cannot reference the class X from an external package. You can do this with a search path of some kind, but if you want to keep it simple and with zero configuration, I don’t see the link even necessary, since the user will know the class by the name easily.

    > 4. The only way to group things currently is by means of
    > moving symbols in categories. This is not well suited for what
    > you want. Some reasons against having custom groups:

    > 4.1 Makes it more difficult to locate a method for a user.
    > Maybe you think a group is something logical, but for the user
    > it might not be. Just sorting methods on alphabet will make it easier to find it.
    True, I agree finally :). I’m going even to change my doxygen documentation.

    > 4.2 Decreases cross project documentation consistency.
    Yes. Since this is a goal I see useful, actually, I propose the solution I gave you in point 3.
    > 4.3 Adds more specialized syntax/directives.
    True also, after thinking more carefully.

    Ok, so let’s look how to make the most consistent tool for documentation among projects and that is not a pain to set up :)

    I will try to take some time to make a proposal on how to group related functions in github (roughly described in point 3), but I cannot promess anything at this time.

    Regards

  12. @Germán Diago
    “I think you misunderstood me. Where did I say so? This would make the tool fairly complicated. Maybe from one of my comments you took that, but I didn’t mean that at any time.”

    Oops.. My bad! I was actually referring to the post by Tristan.

Leave a Reply to Jesse van den Kieboom Cancel reply

Your email address will not be published. Required fields are marked *