flexibility

One more followup to the versioned data ideas. I thought of how I could version some other information like document structure and index entries and the like. It worked quite easily with the last design i came up with – I can just create another table for each type of data, and use the ‘entryrevid’ as the key. It means I have to write out a full set of associated data each time I write a version of the file, but the redundancy means I can look it up faster so it is worth it, and it just simplifies everything.

So after that, I went back to have a look at the content markup. I felt relatively happy with my original idea of using texinfo markup, so i’ve stuck with that.

I rewrote my texinfo parser to use flex. I haven’t used flex much – and not for a long time, so it took me a while to work out how to do some of the more ‘advanced’ things while sticking with getting flex to do as much of the work as I could. Although I will need a dynamic symbol table if I am to implement macros, for now all symbols are hard-coded directly into the flex file, which means less code to write and more speed as an added bonus. I’m not sure i’ll bother with macros actually – I haven’t seen any documents which use them, and there are implementation difficulties in a sparsely stored document. The texinfo grammar isn’t really documented too well either, and it is a little inconsistent in places, so I’ve had to run test files through texinfo to find out what I should do. I had been using texi2pdf but that is a lot less strict than makeinfo is – which is handy, since strictness simplifies my work (I don’t have to worry about random whitespace and the like). Although I don’t really need to be completely compatible with makeinfo, I should probably be for what I implement.

I’m looking at the parser to do two things – convert the format for presentation (i.e. convert to css-friendly html), and extract structure, footnote, and index information. Both will be used for presentation in various parts, and the later will be used for indexing and linking nodes. I have enough working to do both right now, but I might keep tweaking the code a bit more before starting to integrate it, while tweaking is still interesting anyway. I can’t use makeinfo because it isn’t available on the target system, and I don’t think it works with document fragments anyway, and besides, I need structure information and it is easier parsing texinfo than the html output.

I’m still not sure how i’m going to handle linking multiple nodes together into a sequence. Should the @node line be entered explicitly where it can be easily seen and edited, or should it be managed internally by the application and the user interface provide ways of linking the pages up? Hmm, I guess the former should do for now.

It’s sort of interesting to see how i’ve been progressing working on unfamiliar ground. I find the code goes through cycles of functionality, and stability but also of ‘coherence’ (code quality?). They are not distinct phases as such and there is always some overlap, but starting with an empty slate, the phases are something like:

  1. Discovery — You don’t know much or anything about the tool/idea you’re working on. You write tiny test files and cross reference with the manual or specification. You begin to grok the basic underlying theory of operation for the tool/idea.

  2. Expansion — You fill out the implementation, adding functionality very quickly. You rapidly create a tool which does 90% of the job for some% of the functionality. You don’t take too much care of the implementation details as you’re just getting something running. So the code gets a bit messy but it isn’t too messy, and the messiness is isolated and controlled.

  3. Consolidation — You start to notice patterns in your code. Loops or functionality which is repeated more than once. You consolidate this while thinking about how the consolidation might help in the future, but not too much since you don’t know what you’re going to do yet. The code structure slides into a more coherent pattern, the code quality is being increased.

  4. Growth — Now you find additional features you were putting off/didn’t look into earlier can be implement much easier now you’ve consolidated the code. You grow the functionality in perhaps a significant step whilst keeping the code fairly clean.

  5. Breakage — You break a few things. But not too many, and they’re fairly easy to fix. Mostly just syntax errors. Sometimes you over-consolidate and break functionality and have to back out a little bit.

  6. Perfection — You clean up all the little things and get it all working nicely again. You start to notice shortcomings or bugs in the logic now the bugs in the code are cleaned up. ‘Perfection’ here is entirely relative, and it is as perfect as you care to make it at the time (as you lose interest, it is likely to wane in strength).

  7. Expansion … the cycle repeats again.

Actually what often happens is after the first 1 or 2 iterations, you can throw most of the code away and start again. You quickly discover your mistakes when things start to bog down too much or they get too messy. But since you have gained a deeper understanding about the problem you proceed much quicker next time and avoid making similar architectural mistakes along the way. The proverbial ‘throw it all away and rewrite’. But doing it at such an early stage the cost is minimal and it was a learning exercise/prototype anyway. Depending on the problem, you’re only talking about a few hours or a day or two of effort at this point. And in a larger problem there are probably smaller problems within it that this applies to.

I think plenty of ‘RAD’ projects only really get to stage 2, and if they last that long they jump to a rewrite stage, and repeat. These projects tend to include lots of functionality very quickly — and often it does the job — but somewhere down the line they’re either headed for death by spaghetti or a ‘total rewrite’. Of course ‘doing the job’ is definitely good enough in many cases, but it isn’t the same as having a quality engineered product, and comparing the ‘productivity’ of RAD projects to more traditionally engineered ones is meaningless.

Step 3 is really the key to keeping the code quality up. If you skip that step you will save some time in the short term, but will always pay for it later. If you do it often enough that at each stage the problem is contained the cost is quite minimal anyway. It is particularly important with object oriented code (where it is called ‘refactoring’ apparently) as the gains and drains are higher from quality code re-use. If you just keep cutting and pasting objects, adding random methods and so on you will end up with bloated and hard to maintain code very quickly. And it can be harder to distil common functionality into re-usable shareable sets. With a procedural language you’re only considering it at the level of a single procedure, with an OO language you have a whole cohesive set of functionality to worry about.

What’s great about software being written in your spare time is you can ‘throw away and rewrite’ as much as you want. So long as there is some value in doing so, and that is quite subjective – so the value only has to have meaning to you. A list (in rough desirability order) might include:

  • Applying new knowledge‘I can do it better!’

    You’ve learnt enough from the previous attempt, and want to apply that knowledge to improve the functionality or quality of the code. Improving the functionality and code quality at the same time can be a very satisfying intellectual endeavour.

  • Bored, dead-ended, trying something else‘I’m bored with this.’

    Maybe you are going down a dead-ended path. This is related to applying new knowledge, but instead of having learnt something, you find you’re not learning anything and perhaps approaching it from another angle will be more rewarding or interesting.

  • Intellectual curiosity‘Time for a wank!’

    For no particular reason you just feel like doing it in a completely different way. A different language, a different platform, a different meta-programming tool. It’s a rainy day, you’ve got nothing better to do, so you go off in a tangent and blat out some code. Maybe it becomes the new trunk, maybe it gets thrown away, maybe you learn something from it, or maybe you don’t. It might only be one tiny bit of functionality you’re investigating, or the whole architecture.

  • Improving quality to reduce future maintenance‘Time to clean it up.’

    If this is just for the sake of it, and not for any of the other reasons, it will probably be very low on your list. This is maintenance work, and although it may save you pain down the track the immediate payoff is low. Perhaps you are meticulously pedantic and get a kick out of the momentary perfection of the rewrite, or perhaps it’s just that time of the month, and you feel you should.

This is pretty well how I write all code anyway, although at work i’m not going to try different ways for the hell of it so often, and once you go into maintenance mode things are a bit different. Still, the same ideas work at various granularities throughout any code base at almost any time. Software development is generally an iterative process — if you already knew what to write before you started you (or someone) would have to go through the same process anyway, but without the support of a compiler and live testing. Very rarely does the final result match what you started with and you have to keep an eye out for ways to improve it.

Leave a Reply

Your email address will not be published. Required fields are marked *