DB Item Versioning, Again

After thinking about it a bit more I made some changes to my versioning model. I thought since it was fairly simple I’d add the ability for lightweight data copies, separate meta-data from data, and as a side-effect I simplified some of the queries and probably made them more efficient to boot.

With a hangover for most of Saturday it was pretty slow going but I managed to work something out.

I added an extra level of indirection between the entity, revision and it’s content. I also removed the revision from the data, so it is simply a key-value, which maps nicely and more efficiently to Berkeley DB, so I can remove any data marshalling for what will always be the largest data block. It let me remove some of the indexes as well.

The revision and entry tables stay the same:

create table rev (
  id serial not null primary key,
  branchid int not null references rev(id),
  name text,
  author text,
  ctime text,
  constraint rev_name_uc unique (name)
);

create index rev_branchid_idx on rev(branchid);

create table entry (
 id serial not null primary key,
 revid int not null references rev(id),
 classid int
);

But the data table is simpler, and I added a meta-data table, which just contains a name for now:

create table data (
  id serial primary key,
  content text
);

create table meta (
  id serial primary key,
  name text not null
  -- other stuff goes here
);

Then there is a new table which maps revisions of a given entry to its content.

create table entryrev {
  id serial primary key,
  entryid int references entry(id),
  revid int references rev(id),
  metaid int references meta(id),
  dataid references data(id)
};

create index entryrev_entryid on entryrev(entryid);

Since I did this using Berkely DB, I also created an indirect index of ‘name’ to ‘entryrev’, so name based lookups are very simple. This is easy to maintain since data is never deleted. Something similar could be done in SQL by adding a redundant name column to the entryrev table and indexing on that, or you could just use a join and index on metaid.

All the branch matching calculations are the same, but it only has to be done once now and will return keys to all data. So the basic query returns more information, and doesn’t have to be repeated for each set of data (if you want to version more than one). Now to perform a ‘branch merge’ (for a new item or one that doesn’t require merging) just create a new entryrev with a new revision on the target branch and the metaid, entryid, and dataid from the source branch. i.e. create a lightweight copy.

Leave a Reply

Your email address will not be published. Required fields are marked *