cats – Not Zed

I gave Fedora another go last night. No, not on the old laptop I used to develop Evolution on, but a desktop machine I rarely use (it has a dvd burner) – it has enough memory this time.

A few quirks. The installation was nice – a limited number of questions to begin with, then it went off and did the rest for me. This is how it should be – installing an operating system isn’t watching an interactive movie, why would I want to sit there watching a slide-show and answering more question as it goes along? I used the defaults but ‘installed everything’, without selecting individual packages. The default partitioning looked a bit weird – but whatever, if that’s what they reckon – at least it doesn’t have /spare (it’s an EDS thing). Although when it rebooted all I was presented with was a console control panel with no X. I configured that there, and quit – and it gave me a login prompt. Well no matter, log in as root, create a user account, then ‘init 5’ and we’re cooking with gas – oh it is nice to have the init system work ‘normally’ again.

Ahh a GNOME desktop. And no emacs. I thought I clicked on ‘software development’!? Just some weird ‘developer help’ application (and given it seems to use its own help format and doesn’t handle info or man pages – rather useless help at that) and glade got installed with that option. Well I guess gcc must be there anyway.

Tomboy notes. Hmm, what a weird choice to install and turn on by default – to be honest I’ve never used any sticky note application – it has few of the benefits of real sticky notes, none of the benefits of a physical notebook, and limitations a text file doesn’t have. Anyway, definitely not worth having a whole vm running just for that. The only reason it ever got there was political – notes apps have been around forever but nobody thought they were worth including by default until mono came along. The only other mono thing is f-spot, which always seemed like a potentially cool application (yes I even have the t-shirt!) – but then again i’ve never used it – in the early days I could never get it to run reliably and it was very slow and extremely memory intensive, and now i’d probably just use my ps3 to look at pictures (or more likely just let the bits rot in the rain of neutrinos from deep space).

Another ‘update manager’. Oh now, I can see problems coming already. It warns me I need to install a gazillion updates. I ignore it for now (although the gigantic attention demanding billboard over half my screen is a little hard to ignore). Lets see how to install packages. Hmm, Add and Remove Software. Well, I guess it’s familiar to those windows users out there. And then things start to look not so good. Very very slow. I click on ‘XFCE’ and it goes off searching … and searching … and waiting. I check top – hmm, a few processes sucking lots of cpu. Has it just crashed? Oh, no – here we go, it’s finished. No packages. Hmm, that doesn’t seem right. I try a few more and have no luck – all empty. So I go back to the update manager … start it up.

Oooh. Slow. Has it crashed again? It’s sucking cpu like there’s no tomorrow, and nothing appears to be happening – I give it the benefit of the doubt and go back to the TV. Hmm, after 15 minutes – no apparent progress. Ahh, by left-clicking rather then right-clicking on the updater icon I get a status window – such as it is. It just looks stuck – for some time, then it slowly lurches forward, and for the next 45 minutes or so inches it’s way to the end of the line. Oh well, maybe it was busy on the net (not sure why the cpu was so busy though), and who needs to run updates all the time anyway.

Back to ‘add and remove programs’ – by this time I’d searched and discovered this was a new thing called ‘PackageKit’. Ahaah. Anyway, wow. Slow. I mean, not just a bit slow – this is remarkably unusably slow in a really embarrassing way. When the list of packages finally arrive (I was still looking for xfce here), I get the option to click on it and wait for it to load the 1 line of package info as well. Or I can click on the tabs for the other bits of info, and wait even longer – although the more I click on the longer I wait since each is queued up and invoked sequentially. So there appears no ‘meta-package’ (in debian speak) to install xfce, well lets try and install the session and panel – maybe that’ll let me change desktops. Oh dear. Dear oh dear. Now it goes off installing one package at a time (with no real progress indication) and queues up every other job in the meantime. Then it reloads the whole list again in record breaking bullet-time, and lets me go through that unpleasant experience all over again. Hang on, doesn’t Fedora have yum? It isn’t perfect, and was never particularly fast, but it did a lot more a lot faster than this piece of junk.

yum rediscovered – quite a bit tastier than PK. I ran a gnome-terminal so I could run an xterm (oh the irony), and got to work.

yum remove mono

Gone! I get enough .NET at work. And other reasons I needn’t bore you with. Hmm, it seemed to spend an awful long time running gconf-tool during the de-install. I hope gconf isn’t becoming a dreaded global registry … something for another time perhaps.

yum remove PackageKit

Oh oh, it’s gone. I’m finally starting to enjoy Linux again. And the shitty update button is gone too – which seemed to go off and check for updates every minute or so for a good second or more of CPU TIME! I can’t see why updates need to be checked more than daily really – and certainly not such a heavy process.

I still wasn’t sure how to change my desktop – things in /etc/defaults seem to go changing all the time, and i’m sure there was a tool for it. Ahh switcher. yum install switcher. Hmm, not too much documentation. Actually, none. It doesn’t even tell you what options are available. switcher xfce … you need to yum groupinstall xfce. Ok easy enough. Took less than 5 minutes – I imagine it would have taken over an hour in PackageKit, with no guide as to which packages to install either. Done. Switched, done. Logout.

Ahaah! Now the desktop login option is back at least. Although it’s still set to GNOME. Ok, log in to XFCE now. What is that damn network monitor crapplet doing running on this fixed-network workstation? And how come there is no option to quit it like every other crapplet I don’t want?

yum remove whatever-it-was Gone. And at least it’s just gone by itself too – Ubuntu seems to want to de-install init every time you try to remove almost anything.

I did a ps | grep for python. Ahh more useless shit to fuck-off (it was getting late – I had had just about enough of it by now). Whatever they were – gone and gone. The printer thing will have to wait, since from memory it’s part of CUPS – but I’ll check when I have time again and care to.

Hang on, what else is wrong – why does the damn file manager have to open up the window when I put a cdrom in the drive? And steal your focus? How annoying is that – typing away USING the computer – you know – MULTITASKING and the computer absolutely damands your time to look at some CD you put in the drive. After a little hunting I found the options. Off, don’t play damn cd’s or movies automatically either. At least it’s a world of improvement over Winblows which wants you to confirm that you want to open it AS WELL (and get this – how to open it!), after searching for some auto-play application and copying the TAGA LIPA ARE! Virus to your internet explorer again.

Ok, it seems to be working ok now – although I wonder if I can get the ‘legacy’ nivdia drivers working for my ‘ancient’ card (OpenGL – Blender). Maybe I should look at putting this on my laptop too.

BUT Get rid of PackageKit – it’s an utter embarrassment – extremely limited features, terrible usability and SLOW and bloated (gee, red-carpet blew this shit away – 5 years ago, even with all of its earlier bugs and issues). An ‘update icon’ should be tiny and unobtrusive, use very little cpu and poll the server MUCH less often using a lighter protocol – and it’s just a desktop applet – worthless for a multi-user or headless machine anyway. Why install the network manager applet on a fixed workstation? How about a mobile profile? Why can’t it be removed in xfce? Fewer python crapplets overall would be a good idea. And mono ones – oh dear. Hint (for both python and mono): There’s a reason java applets never took off on the web.

Been busy. Had to go away for a week for work. Very strange experience, not sure I want to repeat that again any time soon. Oh well.

I had started looking into writing a vorbis decoder before that major interruption. Something a bit more challenging than normal. About all i’ve gathered so far is that they’ve made some strange decisions in the stream and the huffman coding tables. e.g. a canonical huffman code can be stored very efficiently and reconstructed trivially – but instead they came up with some first-fit mechanism which adds unnecessary complexity and overhead. Still, plenty to learn about and play with. I guess I may get back into it when my mind decides to look into it again.

Meanwhile i’ve been sidetracked at home trying to work out how to improve the way we’re storing trees in a database at work. Relational databases don’t really do trees all that well directly, but with a bit of work you get an ok solution. There is some direct support in Oracle and the SQL standard apparently, but we’re using PostgreSQL, so you have to do things a bit more manually. There are a few ways to do it, but only a couple which are suitable for us – inserts need to be interactively-fast, not just lookups.

Currently we use an auxiliary table which holds the parent-child relationships and so with a simple join you can easily get all children or all ancestors of a given node. The only problem is that it requires some house-keeping using triggers, and also adds some additional restrictions on insertion order and the like (which creates issues with editing functions). About all I could think of is to remove the auxiliary table and do the same job using stored procedures on the fly. Not as simple as you’d think …

Given a structure such as:

create table tree (
 id serial primary key,
 parentid int4 not null references tree(id) on delete cascade on update cascade,
 childcount int4 not null default 0,
 title text
);

create type parent_link as (depth int, id int);

(note that this still requires triggers to maintain the childcount – for a ui app the effort is worth it, and it allows us to make a major optimisation later on).

Here’s a function to find the ancestors. It uses parent_link so it can also return the depth – which is very handy in many algorithms.

CREATE or replace FUNCTION find_ancestors(intab text, inid integer) RETURNS setof parent_link AS $$
DECLARE
    tid integer;
    pid integer;
    res integer[];
    count integer := 1;
    i integer := 0;
    r parent_link;
BEGIN
    tid := inid;
    execute 'select parentid from ' || quote_ident(intab) || ' where id = ' || quote_literal(tid) into pid;
    res[0] := tid;
    while pid <> tid loop
         res[count] := pid;
         count := count + 1;
         tid := pid;
         execute 'select parentid from ' || quote_ident(intab) || ' where id = ' || quote_literal(tid) into pid;
    end loop;

    while count > 0 loop
       count := count - 1;
       r.depth = i;
       i := i + 1;
       r.id = res[count];
       return next r;
    end loop;

END;
$$
 LANGUAGE plpgsql;

It also returns the values in sorted order – not sure that is any use for SQL as it may not guarantee the order once you join without an explicit sort, but it doesn’t really cost any extra processing since we need to keep the whole stack around to calculate the depths. I also tried a recursive algorithm, but it made no real difference.

It is used just like a table:

 select * from find_ancestors('tree', 4);

Ok, now for a utility function – normally this isn’t needed unless you want it for sparse or one-off lookups. Nothing special here, apart from not being recursive.

CREATE or replace
  function tree_node_depth(inid integer, tab text)
  returns integer as $$
DECLARE
  wid int := -1;
  pid int := inid;
  depth int := -1;
  data record;
BEGIN
  while pid <> wid loop
    wid := pid;
    execute 'select parentid from ' || quote_ident(tab) || ' where id = ' || quote_literal(wid) into data;
    pid = data.parentid;
    depth := depth + 1;
  end loop;
  return depth;
END;
$$
 LANGUAGE plpgsql stable;

Now comes the tricky one. How to list all of the children of a tree without needing large amounts of memory or temporary tables. The obvious is just to use a recursive function – but for the life of me I couldn’t get a recursive function which could return a row iteratively (i.e. using return next). I could easily get it to generate a temporary table, but that isn’t what I wanted. My first idea was to use a queue of outstanding rows to examine for children. It worked just fine, but requires lots of memory – in worst case nearly as many elements in the queue as rows in the table (although in reality it is probably a small fraction of the memory used by the application/other end of the query, so it would probably be just fine for all practical purposes for our application).

So I came up with something that just seems a bit hacky. I maintain a stack of cursors, and use some assignment tricks to get around the issue that pl/pgSQL doesn’t seem to like you using cursors from an array (you can copy them, but you can’t use them directly?). I dunno if it’s ‘proper coding’, but it works …

CREATE or replace
  FUNCTION find_children(inid integer, tab text)
  RETURNS setof parent_link AS $$
DECLARE
    r parent_link;
    data record;
    children refcursor;
    stack refcursor[];
    depth int := 0;
    depthoffset int := 0;
BEGIN

    depthoffset := tree_node_depth(inid, tab);

    -- first node
    r.depth = depthoffset;
    r.id = inid;
    return next r;

    open children for execute 'select id,parentid,childcount from ' || quote_ident(tab) || ' tree where parentid <> id and parentid = ' || inid;
    stack[depth] := children;

    while depth >= 0 loop
      children = stack[depth];
      fetch children into data;

      if not found then
        close children;
        stack[depth] = null;
        depth := depth - 1;
      else

        r.depth = depth + 1 + depthoffset;
        r.id = data.id;
        return next r;

        if data.childcount > 0 then
           depth := depth + 1;
           children = null;
           open children for execute 'select id,parentid,childcount from ' || quote_ident(tab) || ' tree where parentid <> id and parentid = ' || data.id;
           stack[depth] = children;
        end if;
      end if;

    end loop;
END;
$$
 STABLE
 LANGUAGE plpgsql;

Note that since we have childcount maintained elsewhere, we can optimise the leaf search – saving many sub-queries looking for linked children we would have to execute otherwise.

Now I can’t really tell if this is going to run very quickly, or if it saves much memory – I guess it should – a limited stack of cursors vs an extra or temporary table or just storing the id’s in an array. I don’t have enough data or the energy to set it up a test case, and every time I run a query in pgadmin I get wildly different timings. Gut feeling is that with decent indexes queries should be in a similar order of magnitude to the auxiliary table, but without most of the overhead of keeping it intact – speeding up and simplifying insertions and updates. It might also benefit from reducing cache load on redundant data and increasing locality of reference for pages i’m probably going to load anyway. And finally, although it means there is a black-box for the query planner, thus reducing potential optimisations – there may not many opportunities for it do much anyway.

Hmmm, you know, all these hassles to use a relational database and shoe-horn structured information into it. I’m just not sure it’s worth it. Yes it is very handy to have an sql window to run queries and test ideas – but most of the work could be done simpler and with less developer time using code and some sort of persistent object store. I guess it depends on whether we move to a multi-user model and take a short-cut by using a database to do it rather than implementing a proper n-tier architecture.

Tag: cats

fedora

squelch