DB Item Versioning, Again

After thinking about it a bit more I made some changes to my versioning model. I thought since it was fairly simple I’d add the ability for lightweight data copies, separate meta-data from data, and as a side-effect I simplified some of the queries and probably made them more efficient to boot.

With a hangover for most of Saturday it was pretty slow going but I managed to work something out.

I added an extra level of indirection between the entity, revision and it’s content. I also removed the revision from the data, so it is simply a key-value, which maps nicely and more efficiently to Berkeley DB, so I can remove any data marshalling for what will always be the largest data block. It let me remove some of the indexes as well.

The revision and entry tables stay the same:

create table rev (
  id serial not null primary key,
  branchid int not null references rev(id),
  name text,
  author text,
  ctime text,
  constraint rev_name_uc unique (name)
);

create index rev_branchid_idx on rev(branchid);

create table entry (
 id serial not null primary key,
 revid int not null references rev(id),
 classid int
);

But the data table is simpler, and I added a meta-data table, which just contains a name for now:

create table data (
  id serial primary key,
  content text
);

create table meta (
  id serial primary key,
  name text not null
  -- other stuff goes here
);

Then there is a new table which maps revisions of a given entry to its content.

create table entryrev {
  id serial primary key,
  entryid int references entry(id),
  revid int references rev(id),
  metaid int references meta(id),
  dataid references data(id)
};

create index entryrev_entryid on entryrev(entryid);

Since I did this using Berkely DB, I also created an indirect index of ‘name’ to ‘entryrev’, so name based lookups are very simple. This is easy to maintain since data is never deleted. Something similar could be done in SQL by adding a redundant name column to the entryrev table and indexing on that, or you could just use a join and index on metaid.

All the branch matching calculations are the same, but it only has to be done once now and will return keys to all data. So the basic query returns more information, and doesn’t have to be repeated for each set of data (if you want to version more than one). Now to perform a ‘branch merge’ (for a new item or one that doesn’t require merging) just create a new entryrev with a new revision on the target branch and the metaid, entryid, and dataid from the source branch. i.e. create a lightweight copy.

DB Item Versioning

I’ve been consumed by thinking about database versioning. Why I wonder? What a silly thing to be consumed about. Is there just nothing to watch on TV? Hmm, no there isn’t. Still – I’m starting to get a bit sick of it … but before I drop it never to return I thought id’ better put brain to keyboard and dump out some of what I worked out.

I just wrote an article covering the basics of how to version records in a database. The basic algorithm and data structures, how to perform some of the typical things you want to perform with versioned data-sets, enhancement ideas and so on. Maybe someone will find it useful.

a long wet weekend

Well it wasn’t that wet, but it was close enough and it was cold and dreary. I spent an awful lot of the weekend playing with an idea I had last week. Another piece of software to run a blog/wiki/etc – i.e. a ‘cms’. I don’t really need one, and the world doesn’t need another one, I just want to update some of my skills and play with some web stuff again (as much as a vowed never to do any web coding again after a php experience, wpf is worse and has driven me to these depths) and exercise a few other ideas i’ve had lately.

And I want to do it with raw c and no frameworks, just to get back to basics again for a while. I’m only using libdb, and gperf for a static hash table. It is also a cgi programme so I can (hopefully) run it on my isp’s web site (that is another reason I can’t use frameworks). Being a fork/exec cgi means I can be lazy with resources too – implicit garbage collection by means of calling exit. At least for small strings, and that simplifies the code a bit more. I’m also avoiding xml. I don’t really like xml, and I don’t think it aids productivity in most cases – to me it is a technology that lets proprietary products ‘open up’ a little controlled access into their internal logic (thinking xslt mainly here). But it is not a natural way to code for most developers and is extremely resource intensive (at least the c implementations are – java/.net is so resource intensive anyway it’s hard to tell). Anyway – I don’t need it, so I wont use it (unless I start looking at dynamic asynchronous javascript down the track).

So after doing the basic stuff of a simple cgi argument parser, and refreshing some ‘form’ knowledge, I started looking into presentation, and programme flow. I have not looked at css before – so there’s plenty to look at there, and it looks like it’ll be flexible enough to not need anything like xslt for presentation.

But how to add all the basic support HTML around the content? First I thought of templates parsed at run-time. Something like server side includes but with a little more flexibility. But I want a minimum of processing at run-time, and then there’s all the hassle of template-to-function linkage, showing pages recursively and all that guff. I thought i’d just pre-compile the templates, a bit like the way xaml works, but without all the crap. Compile into tables which are executed at run time? No hang on, why not just into a function-per-template and a string of function calls? Hmm, that suddenly made the problem tiny. A very simple ‘compiler’ that converts the templates into a sequence of function calls – output string, call function, output string, etc. Let the compiler handle the linkage and storage issues, and I just have to handle a little run-time context. And the added advantage is that run-time is very fast and the memory use is minimal – and shareable – since the page is read directly in it’s executable form. Problem solvered. I didn’t really want to go the php route and have embedded logic which can end up becoming a huge mess, so you can only call functions and each page outputs in its entirety and the control logic is in the c source. After all I’m not writing a framework here, just a solution to a specific problem, so there’s no need to add more smarts than is required. I wrote a little makefile magic to compile these templates, and some bourne-shell to generate the ‘/* AUTOMATICALLY GENERATED */’ header files, and now I have a nice ‘void page_front_page(void)’ I can call from any other page or c code to output whatever is in ‘front-page.html’.

I had to get up to speed on libdb again – I looked at it for evolution just before I left, and I wanted everything to be transaction protected, use sequences, multiple secondary indices, and possibly queues in the future. Didn’t take long – libdb has excellent documentation. One issue is how to store the data since libdb only gives you a key-value pair, you need to pack the rows yourself. I decided for now on a simple tagged format – extremely simple (and fast of course) to parse on loading and no need to have it human readable. And being tagged means it can change in the future without breaking everything (that was a mistake I made with camel – trying to compress the data too much, the order and number of items was important). Initially I was going to store the records as files and just store the meta-data in the table, but I decided to put everything in the database, at least for the moment. Conceptually, each record is a little like an ldap record, a field followed by a value with the capability of having multi-valued fields. And everything is binary-safe.

That let me add a few more pieces – now I could throw together a front page – 2 small template files, the main page which calls the function to output the content, and a one to display the summary of the post. An editor. A bit of messy logic in the c file to track the lifetime of the editing session, plus one editor page template to communicate with the user. Simple basic stuff – and that’s half of a blog done. Oh I threw together an rss feed with a little bit more work as well.

And then I had enough time to look at the user interface to the content – the ‘wiki’ language. Rather than re-inventing another weird wiki language, I thought i’d try using texinfo (and only texinfo – no html/special stuff). This has been an idea swirling around in my head for a while now. What if you could edit a complete texinfo document, where each @node is a separate page, and the node links (next/prev/up) are used to automagically bind all works into a coherent whole. One problem I find with wiki’s is that the nodes mostly sit by themselves, and the linking process is ad-hoc (usually it is not done since it isn’t completely necessary too – pages stand alone as articles) and it is difficult if not impossible to convert part or all of a wiki into a a printed or all-in-one-page document. Add in the cross referencing and indexing features in texinfo and you could really have a powerful system. Well that’s the idea anyway. For now i’ve got only a small part of texinfo implemented and all it really does is formatting and external links. I’m still not sure how to handle node/navigation – the wiki/texinfo way is to use a node name – but if you change that I want it to go and update any links too – well I have a transactional database with multiple indices, so it should be doable. And I probably need some sort of namespace mechanism to manage multiple separate documents. I can’t run any texinfo tools because they might not be available and it’s too much of a security issue anyway, so I need to do the processing internally (I need to do that anyway for any automatic manipulation).

But I dunno – the texinfo thing is a big task, so i’ll see how it goes – for now it provides a reasonable markup syntax for many types of technical documents. There are plenty of other issues like user management and concurrent access too, should I bother looking into those. Hmm, versioning – that might be something quite interesting to look into too – and something else i’ve been thinking about recently.

I also ordered a new thinkpad. There’s nothing particularly wrong with my old one – T40 – 5 years old and still in good shape (motherboard/keyboard replaced after 3 years on warrantee, and i dropped it a week later, but just a tiny bit of the case broke off). I just felt like spending money and I thought i’d try an X series this time since they were on special last week, and came with a dock/dvd burner thrown in at a reasonable price. I’ll miss the touchpad and the keyboard light, but the x300 is too expensive.

gaming the system

I’m sure it isn’t just me that has noticed Google isn’t really as useful as it used to be any more. First there were the empty ‘wrapper’ sites that got onto the adwords box – you know, the ones that seemed to have ‘all about foo’ for every ‘foo’ search, but when you clicked on them just had the output of a search engine in them. Adwords are easy to ignore but sometimes you do actually want to find companies selling stuff. They were occasionally in the main result area too. They seem to come up a little less often now – or maybe i’m just searching for different things.

Then we have cloaking. e.g. the web site serves different content to a search engine than it does to users. So when you do a search you get a nice summary of what looks like what you want, but you click on it and all you get is a payment gateway. It is particularly prominent when looking for technical articles. See Summary of Academic Publishers Cloaking Discussion for some more information on this. It sucks big time.

Just as an example, lets try something simple, oh I dunno ‘efficient algorithm for sorting numbers external’ – a typical type of search for a software engineer.

8 links down we have (i’m not putting the link in html on purpose).

  A method for improving the efficiency of external sorting ..,.
    more efficient external sorting algorithms,based on a variety of
    distribution ... number of nodes), and an identical number of
    branches go from each node, ...
    www.springerlink.com/index/V3L0179J1801278L.pdf -

Ok, this isn’t really that useful looking, but this is just an example, and lets just take it as being what you’re after. A pdf and everything, lets go look … oh no, its just a payment gateway. $US32 for a paper … Hmm, that seems a little steep. Particularly if you look at the publishing date (go on, have a look, it might surprise you). I wonder how much of that the author gets, if he’s still alive.

Sometimes google scholar helps (but not in this particular case), given the title and author(s) you can often find free or draft versions of papers, but this is still a pain in the arse – why are these sites showing up at all in the main index when they are cloaking their information and intentionally gaming the system? I’m finding that searching for good quality coding and technical information is getting harder and harder, and google being complicit in this cloaking (see the linked article above, or search for ‘springerlink sucks’) just makes me angry at them (and frankly, who cares about the other search engines – they’re irrelevant).

And finally – take those away and searching for many types of information is just a lot harder than it used to be. I guess ‘the web’ has grown, and it’s mostly grown full of rubbish. I had yet another problem with Ubuntu yesterday – now I find 8.04 has major issues with USB mass storage devices on my laptop. Devices will drop out causing corruption, or refuse to work at all, both being totally unusable at best. It took a lot of searching for the right terms to uses to find something about the problem – and that was a lonely post on a forum. I guess we’re just unlucky with this together. Certain very popular terms like ubuntu, debian, fedora, linux are now so common it’s raising the signal to noise ratio significantly for any searches containing those terms. And so many sites cross-link with others too much that using linkage to weight results is becoming less useful (not that it was always super-great – I remember how advogato used to figure on the front page of just about any search for people who had an account on it).

I’m not sure about google news either. Today there were at least 4 stories on the iphone on the Australian front page – 3 in tech (i.e. all of them) and 1 in business. In the tech section by itself – the top 4 stories, with roadrunner (the fastest supercomputer in the world) pushed down to 5 or 6 (personally I think that is more tech-worthy, iphone belongs on the fashion or business pages if you ask me). Ok, the iphone is full of buzz, but one grouped story should surely suffice (google’s news selection is a bit strange sometimes, but normally it is at least a little better at grouping the same press release).

`fedora’ responses

Well nobody seems to comment on ‘good stories’ – maybe I should rant more often? Anyway, it seems I have a reputation in the GNOME world as being an arsehole, so why not. Threat’s about employers reading the blog? Yeah nice one.

I realise I was going to offend the author of packagekit – but seriously, this is not ready for release as the main package ui on any distro. If what it did it did well it might be ok, but it has some time to go. Maybe he needs some offending to get his arse into gear and make it happen and prove me wrong? It’s out there in the wild now as the primary update interface on a public release of a popular distribution – he’s gotta expect criticism, and he’s gotta expect at least some rants – it’s not like i’m mailing the guy or spamming his blog’s comments – it’s my blog. If the app is busy, it needs to make that obvious, not just sit around for 10’s of seconds appearing to do what you asked, and then come up with a blank list for no apparent reason. I have the fastest net connection I can buy but it was a bit busy at the time – still, why does packagekit not cache any meta data, at least for the current session? How come it takes 100mb to run yum – if that’s all it’s doing – I thought surely it was doing more than that? Installing 1 package at a time is not good enough – computers are designed to run batch processes automatically, why force me to handle it? Why is every operation serialised when many don’t need to be? The machine was fine when running yum by itself (even with the ‘busy’ network), so it isn’t the cpu/memory or network (it wasn’t swapping even running the update thing). You can complain that I need a faster box or net – but I don’t need either for running anything else I want to run.

I’ll just say one more thing – you can’t have it both ways – if it wasn’t ready for prime time you should have asked Fedora not to use it – or by getting it in the distribution you get the exposure and fame and flashing lights – but have to expect the exposure to generate a range of opinions, from positive to negative. It isn’t personal (how could it be, we don’t know each other) – although it is impossible not to take it that way I know.

Ubuntu is mostly fine – but after giving it a pretty good run I think it’s just not for me. It’s too focused on newbies or windowsies, not ‘veteran’ linux users – yes, that is their target of course, but attracting developers wouldn’t hurt them either. Venting my personal frustration in my own blog should be ‘allowed’, and I don’t think I need to ask anyone for permission. Debian is known for making strange decisions by an overly politicised process by strong-willed individuals – my opinions there are nothing new (and that is all I meant by ‘*BSD like’ btw).

And yes Synaptic is quite nice. The only thing I don’t really like about it is that it’s quite slow at searching, and listing packages. I can’t remember what I used on suse (10), but from (a somewhat unreliable) memory i thought it was faster/nicer to use.

Umm, if the network cable comes loose, I’d presume the network would come back online all by itself when it got plugged back in – just like it always has? I certainly shouldn’t be logged in and running a crapplet (NB: i didn’t invent the word) for it to reconnect for example, or run any command as root. Do you run a desktop on a web server just to configure it’s network? Can’t their cables also get knocked out?

I installed ‘everything’ (desktop, web server, developer) but of the 3 applications I use daily on any computer – one wasn’t installed. Of course i’m going to complain about some fluff that is installed that directly affects my user experience that was installed instead – hey at least the man pages and info files are there. And are you saying packagekit is not installed by default in most configurations? I’d never heard of it – how could I go looking through dozens of packages to remove it? I imagine any ‘desktop’ would include the other stuff too. I’ve been there and done that – trying to tune my system exactly how I wanted before I installed it. All I did was end up with a broken system and wasting even more time fixing it at both ends. I rarely even run the disk partitioner anymore either,since i’ve had more than one failed install by trying to get things how I wanted it.

As for mono – I don’t hate mono. Mono is ok, technically quite a feat too – I think the effort could have gone elsewhere personally (hint: that means it’s opinion), but although it might not be obvious, I do know where Miguel is coming from and I ‘get’ what he is trying to do, and it is a good thing that someone is doing it, and that they have plenty of financial backing to pursue it, and the grand vision and enthusiasm for it. Ok, initially I was lumbering with evolution, bonobo and e-tree and thought he was just starting another ill-fated quick-results project he’d leave for someone else to finish! But that was the very early days – I worked on a mono plugin for Evolution after all – but nobody seemed to want it so I gave up. However he had to expect political backlash from many people given he was effectively cloning MS technology … and much later on the novell-ms deal didn’t help – no matter what the (secret) realities are of it, it’s done, and it will always be hanging over the project for some people – WHICH IS A TERRIBLE SHAME because so much time and effort has gone into it and there are plenty of good ideas and technology there. Politics in and around technical projects totally sucks, but that’s the reality. Time will tell anyway – and tends to iron out issues like this by itself.

For me, currently there’s no apps I need that use it (f-spot is a really good app but i dont need it), and I want to avoid the temptation to write .net code at home (odd yes, but indeed true). And yes I do have personal misgivings about any MS technology in general, and specifically any on my Linux box, but that was just the icing on the cake. What I hate is .NET itself. I use it at work. On a Windows box. That’s a lot to hate. At least with mono there’s the potential that one day they’ll be able to address the main memory issue with any vm based system – of having multiple virtual machines running for separate applications. Would putting them all in 1 environment work? They do it for Java for enterprise/backend applications – how come it isn’t used for desktop applications as well? If the whole desktop and most of the applications ran from a single vm it would probably do rather well – probably better than ‘n’ C applications which have to initialise each application toolkit separately and which share only read-only c library data and code between them (and an order of magnitude better than ‘n’ C language engines loading (and compiling) ‘n’ language ‘scripts’ which load ‘m’ language ‘libraries’ and sharing only the memory-mapped C library parts between them). (where by ‘C’ I means pre-compiled memory-mappable code/data).

I do rather dislike python mind you. Somewhat how I dislike visual basic. I don’t really hold strong opinions about any other languages but those two. Well, javascript isn’t that high on my list either, tcl has some issues at that. On reflection maybe the reason is simple – the generally bad experience (IN MY EXPERIENCE – it’s called opinion, not everyone agrees with that opinion, but it’s mine, and I hold it) from vb/python apps. And well, BASIC sucks.

BTW with Fedora, my install was still left with init running at level 3 (vi fixed that). I’m not sure I ever told it I didn’t want X running (I installed a desktop system afterall). Maybe it had something to do with a text-mode install, but I can’t remember being given the option to turn off/on X (apart from setting it up).

comments

Didn’t notice the comment approval thing in this CMS … so I just approved all the comments.

BTW i’ve been assured tomboy wasn’t political. I guess it was made using some other decision process I don’t understand.

Update
I unapproved the SPAM. I guess I might have to tone things down a bit – from the comments some people actually read this drivel, which is both good and bad to know. I’ll try to go back to coding and leave the pythonites and debianites alone.

fedora

I gave Fedora another go last night. No, not on the old laptop I used to develop Evolution on, but a desktop machine I rarely use (it has a dvd burner) – it has enough memory this time.

A few quirks. The installation was nice – a limited number of questions to begin with, then it went off and did the rest for me. This is how it should be – installing an operating system isn’t watching an interactive movie, why would I want to sit there watching a slide-show and answering more question as it goes along? I used the defaults but ‘installed everything’, without selecting individual packages. The default partitioning looked a bit weird – but whatever, if that’s what they reckon – at least it doesn’t have /spare (it’s an EDS thing). Although when it rebooted all I was presented with was a console control panel with no X. I configured that there, and quit – and it gave me a login prompt. Well no matter, log in as root, create a user account, then ‘init 5’ and we’re cooking with gas – oh it is nice to have the init system work ‘normally’ again.

Ahh a GNOME desktop. And no emacs. I thought I clicked on ‘software development’!? Just some weird ‘developer help’ application (and given it seems to use its own help format and doesn’t handle info or man pages – rather useless help at that) and glade got installed with that option. Well I guess gcc must be there anyway.

Tomboy notes. Hmm, what a weird choice to install and turn on by default – to be honest I’ve never used any sticky note application – it has few of the benefits of real sticky notes, none of the benefits of a physical notebook, and limitations a text file doesn’t have. Anyway, definitely not worth having a whole vm running just for that. The only reason it ever got there was political – notes apps have been around forever but nobody thought they were worth including by default until mono came along. The only other mono thing is f-spot, which always seemed like a potentially cool application (yes I even have the t-shirt!) – but then again i’ve never used it – in the early days I could never get it to run reliably and it was very slow and extremely memory intensive, and now i’d probably just use my ps3 to look at pictures (or more likely just let the bits rot in the rain of neutrinos from deep space).

Another ‘update manager’. Oh now, I can see problems coming already. It warns me I need to install a gazillion updates. I ignore it for now (although the gigantic attention demanding billboard over half my screen is a little hard to ignore). Lets see how to install packages. Hmm, Add and Remove Software. Well, I guess it’s familiar to those windows users out there. And then things start to look not so good. Very very slow. I click on ‘XFCE’ and it goes off searching … and searching … and waiting. I check top – hmm, a few processes sucking lots of cpu. Has it just crashed? Oh, no – here we go, it’s finished. No packages. Hmm, that doesn’t seem right. I try a few more and have no luck – all empty. So I go back to the update manager … start it up.

Oooh. Slow. Has it crashed again? It’s sucking cpu like there’s no tomorrow, and nothing appears to be happening – I give it the benefit of the doubt and go back to the TV. Hmm, after 15 minutes – no apparent progress. Ahh, by left-clicking rather then right-clicking on the updater icon I get a status window – such as it is. It just looks stuck – for some time, then it slowly lurches forward, and for the next 45 minutes or so inches it’s way to the end of the line. Oh well, maybe it was busy on the net (not sure why the cpu was so busy though), and who needs to run updates all the time anyway.

Back to ‘add and remove programs’ – by this time I’d searched and discovered this was a new thing called ‘PackageKit’. Ahaah. Anyway, wow. Slow. I mean, not just a bit slow – this is remarkably unusably slow in a really embarrassing way. When the list of packages finally arrive (I was still looking for xfce here), I get the option to click on it and wait for it to load the 1 line of package info as well. Or I can click on the tabs for the other bits of info, and wait even longer – although the more I click on the longer I wait since each is queued up and invoked sequentially. So there appears no ‘meta-package’ (in debian speak) to install xfce, well lets try and install the session and panel – maybe that’ll let me change desktops. Oh dear. Dear oh dear. Now it goes off installing one package at a time (with no real progress indication) and queues up every other job in the meantime. Then it reloads the whole list again in record breaking bullet-time, and lets me go through that unpleasant experience all over again. Hang on, doesn’t Fedora have yum? It isn’t perfect, and was never particularly fast, but it did a lot more a lot faster than this piece of junk.

yum rediscovered – quite a bit tastier than PK. I ran a gnome-terminal so I could run an xterm (oh the irony), and got to work.

yum remove mono

Gone! I get enough .NET at work. And other reasons I needn’t bore you with. Hmm, it seemed to spend an awful long time running gconf-tool during the de-install. I hope gconf isn’t becoming a dreaded global registry … something for another time perhaps.

yum remove PackageKit

Oh oh, it’s gone. I’m finally starting to enjoy Linux again. And the shitty update button is gone too – which seemed to go off and check for updates every minute or so for a good second or more of CPU TIME! I can’t see why updates need to be checked more than daily really – and certainly not such a heavy process.

I still wasn’t sure how to change my desktop – things in /etc/defaults seem to go changing all the time, and i’m sure there was a tool for it. Ahh switcher. yum install switcher. Hmm, not too much documentation. Actually, none. It doesn’t even tell you what options are available. switcher xfceyou need to yum groupinstall xfce. Ok easy enough. Took less than 5 minutes – I imagine it would have taken over an hour in PackageKit, with no guide as to which packages to install either. Done. Switched, done. Logout.

Ahaah! Now the desktop login option is back at least. Although it’s still set to GNOME. Ok, log in to XFCE now. What is that damn network monitor crapplet doing running on this fixed-network workstation? And how come there is no option to quit it like every other crapplet I don’t want?

yum remove whatever-it-was Gone. And at least it’s just gone by itself too – Ubuntu seems to want to de-install init every time you try to remove almost anything.

I did a ps | grep for python. Ahh more useless shit to fuck-off (it was getting late – I had had just about enough of it by now). Whatever they were – gone and gone. The printer thing will have to wait, since from memory it’s part of CUPS – but I’ll check when I have time again and care to.

Hang on, what else is wrong – why does the damn file manager have to open up the window when I put a cdrom in the drive? And steal your focus? How annoying is that – typing away USING the computer – you know – MULTITASKING and the computer absolutely damands your time to look at some CD you put in the drive. After a little hunting I found the options. Off, don’t play damn cd’s or movies automatically either. At least it’s a world of improvement over Winblows which wants you to confirm that you want to open it AS WELL (and get this – how to open it!), after searching for some auto-play application and copying the TAGA LIPA ARE! Virus to your internet explorer again.

Ok, it seems to be working ok now – although I wonder if I can get the ‘legacy’ nivdia drivers working for my ‘ancient’ card (OpenGL – Blender). Maybe I should look at putting this on my laptop too.

BUT Get rid of PackageKit – it’s an utter embarrassment – extremely limited features, terrible usability and SLOW and bloated (gee, red-carpet blew this shit away – 5 years ago, even with all of its earlier bugs and issues). An ‘update icon’ should be tiny and unobtrusive, use very little cpu and poll the server MUCH less often using a lighter protocol – and it’s just a desktop applet – worthless for a multi-user or headless machine anyway. Why install the network manager applet on a fixed workstation? How about a mobile profile? Why can’t it be removed in xfce? Fewer python crapplets overall would be a good idea. And mono ones – oh dear. Hint (for both python and mono): There’s a reason java applets never took off on the web.

debian madness

Sigh. It’s just about the last straw for me and debian-based systems.

I wanted to read about the texinfo format, so naturally I ran ‘info texinfo’, and all I got was a man page which told me to run ‘info texinfo’ to get the full documentation. Oh funny – GNU is full of recursive jokes but this is just silly. Ahh well off to try and install it. Hmm, no texinfo-doc package, huh? Just some non-free ‘info files’ which didn’t seem related to texinfo at all. Oh what? They ARE the documentation. WTF Is going on?

Ho hum. Ok, so i’m a bit late to the party – 2 and a bit years late – but really, these debian guys have lost the plot a bit – they’ve always had a ‘*BSD crowd’ feel about them and things like this just reinforce that impression. Maybe that’s why i’ve had so much trouble finding the documentation for just about everything I looked for.

Complete documentation is a core strength of Unix and by extension the GNU system. It is one of it’s main benefits over other so-called ‘operating systems’ like Windows. Without readily accessible man pages how can you learn to use the system? To write software? I learnt perl from the rather excellent man pages that at least used to exist – why buy a book which will be out of date quickly and is hard to search?

Documentation IS part of an application. To not install it by default is bad enough, to not install it on purpose for quite petty reasons is utterly atrocious.

proprietary file formats

My house mate wanted to edit her CV, but of course it is in microsoft’s shitty binary format. Loading it into open office has about as many formatting issues as it did the last time I tried something similar years ago … maybe open office devs aren’t focusing on those kind of things (i thought it was a focus at least at some point). I’m not blaming them – its ms, or more directly, TAFE for wasting limited educational resources on such rubbish.

Google docs was actually a little better – it lost more formatting, but it did it in a more consistent and visually pleasing manner. Still, not much use for this case either. The file hasn’t really been formatted properly (not using styles/margins properly) – but what can you – that is the mode of operation wysisyg editors enforce.

Microsoft re-announcing that they’re going to support ODF has no positive meaning either. Even if they ever do what they say, they will still have enough incompatibilities that they will just set the ‘standard’ on any ambiguous language in the specification, or simply break it on purpose. You can quite legitimately claim to support a standard, yet still have ‘bugs’ which make your product the one the others have to follow or work with. Just look at internet explorer, or outlook

At work we had a shitty office 2003 xml loader for excel files, but I got sick of its api and wrote a csv loader instead. Oh joy, so nice to be able to edit excel ‘files’ in emacs, and now they load almost instantly, using fewer lines of code, and much less memory. Rather than having to run-time compile an xml de-serialiser and load the whole thing into memory into objects that get translated into arrays of strings, which mostly get thrown away. So it has fewer features, but they’re not features I actually need.

While i’m here – I’m a bit sick of the ‘xml solves the worlds woes’ rubbish – if anything the whole ooxml debacle should prove otherwise. XML is not really a panacea for anything, it’s just a convenient if a bit complicated file format for data interchange. There are far simpler and far more efficient binary formats for the same thing too, which would probably make an awful lot more sense when you don’t need a text editor to edit them (xmlrpc, soap, anyone?). I find it rather disturbing that people talk about using ‘the dom’ as an internal api for an applications which work with XML as a ‘good thing’ too. It isn’t. It is a terrible api for internal data manipulation. XML should stick to what it is good at (hmm, is it even?), data interchange. An external format. Letting external file formats directly become your internal data representation is just as bad as doing it the other way around. It might be convenient for one-off simple applications, but all you’re doing is replacing native language features with amorphous language neutral ones which will never be as easy to write or maintain or scale.

bash crash dash

Had a strange episode on my laptop last night. Not sure if it’s the update I ran a few hours before, or flakey hardware, but all of a sudden I couldn’t start any new shells, and the cpu was locked so hard the machine turned into a hair dryer (which is why I wanted to run a shell – to run top). I used the xfce process manager to kill a busted bash process which was locking the cpu (although unlike windows, the machine was still responsive enough, I didn’t want to burn the thing out).

Turns out that /bin/bash somehow got corrupted. Hmm, tricky. Bit hard to fix something like that if you can’t log in – so much for gui tools. Hmm, what to do. I thought of emacs and shell mode, and guessed the way to specify the shell to run. After trying all the obvious things like ash, tcsh, and csh and discovering none of those shells were installed, I took a punt on the oddly named but completely unfamiliar dash. Horrah, finally I could run a shell! So I changed my login shell to dash and started poking around. Hmm, apt-get --reinstall install bash perhaps? Oh no, it needs bash to run the pre/post install scripts – it just always failed. Poked around man pages in dpkg and apt-get. Not much luck, no obvious way to disable the scripts. But at least I found where the package was cached. Hmm, dpkg-reconfigure? Oh helpful. Package bash is not working or only partially installed. No shit sherlock. Bear with me here, I really don’t grok debian’s package system – all I remember from the Evolution days is that they made up their own packages and versions which just gave the developers extra headaches. Ahaah, dpkg-deb -x bash* /tmp/foo. We have a binary. cp bin/bash /bin/bash, re-fix my shell, and finally it works.

Bit of a panic there – it’s been so long since i’ve had to administer a box at that level (apart from disabling all the crap that runs at login), and quite frankly there’s a reason for that. Hmm, so maybe there’s some hardware issue with my machine – firefox has started crashing a lot too (firefox 2 that is, 3 was way too much to use), or maybe it’s just related to the 8.04 install or an update.