proprietary file formats

My house mate wanted to edit her CV, but of course it is in microsoft’s shitty binary format. Loading it into open office has about as many formatting issues as it did the last time I tried something similar years ago … maybe open office devs aren’t focusing on those kind of things (i thought it was a focus at least at some point). I’m not blaming them – its ms, or more directly, TAFE for wasting limited educational resources on such rubbish.

Google docs was actually a little better – it lost more formatting, but it did it in a more consistent and visually pleasing manner. Still, not much use for this case either. The file hasn’t really been formatted properly (not using styles/margins properly) – but what can you – that is the mode of operation wysisyg editors enforce.

Microsoft re-announcing that they’re going to support ODF has no positive meaning either. Even if they ever do what they say, they will still have enough incompatibilities that they will just set the ‘standard’ on any ambiguous language in the specification, or simply break it on purpose. You can quite legitimately claim to support a standard, yet still have ‘bugs’ which make your product the one the others have to follow or work with. Just look at internet explorer, or outlook

At work we had a shitty office 2003 xml loader for excel files, but I got sick of its api and wrote a csv loader instead. Oh joy, so nice to be able to edit excel ‘files’ in emacs, and now they load almost instantly, using fewer lines of code, and much less memory. Rather than having to run-time compile an xml de-serialiser and load the whole thing into memory into objects that get translated into arrays of strings, which mostly get thrown away. So it has fewer features, but they’re not features I actually need.

While i’m here – I’m a bit sick of the ‘xml solves the worlds woes’ rubbish – if anything the whole ooxml debacle should prove otherwise. XML is not really a panacea for anything, it’s just a convenient if a bit complicated file format for data interchange. There are far simpler and far more efficient binary formats for the same thing too, which would probably make an awful lot more sense when you don’t need a text editor to edit them (xmlrpc, soap, anyone?). I find it rather disturbing that people talk about using ‘the dom’ as an internal api for an applications which work with XML as a ‘good thing’ too. It isn’t. It is a terrible api for internal data manipulation. XML should stick to what it is good at (hmm, is it even?), data interchange. An external format. Letting external file formats directly become your internal data representation is just as bad as doing it the other way around. It might be convenient for one-off simple applications, but all you’re doing is replacing native language features with amorphous language neutral ones which will never be as easy to write or maintain or scale.

5 thoughts on “proprietary file formats”

  1. You forget the business case. You can patent anything that ends …. “with xml” ;).

  2. Firstly, the binary formats are published now as a result of an antitrust settlement.

    Secondly, I’ll trade a small amount of efficiency for the huge amount of benefit a mature, well-tested, widely-interpretable data modelling format any day. The fact you don’t have to worry about writing a new parser every time you add a new feature is enough to seal the deal alone. And if your adhoc format is flexible enough that you don’t, you’re probably not as efficient as XML anyway.

    Give XML a chance!

  3. You are right saying “XML is not really a panacea” but what it really helpful of using XML on document formats is that anyone know how it looks, that make developers with at least a little pride on what they code to create document structure that seem reasonable to understand (not everyone like you can see on the OOXML structure), with binary formats there is no way that someone can check what you thought was a good file structure without understanding what you code or document of it. I do not imagine the Web today with a more efficient binary format

  4. I’ve had more success with the latest OO.o 3.0 beta. Maybe worth giving that a try :-)

  5. A good CV is only a page long anyway (or two at most)– would probably have been quicker just to redo it from scratch in OO.o/ Google Docs/ AbiWord / whatever.

Leave a Reply

Your email address will not be published. Required fields are marked *