New Default Branch Format in Bzr

One of the new features in the soon to be released bzr 0.8 is the new “knit” storage format.

When comparing the size of the repository data for jhbuild with “knit” and “metadir” formats (metadir is just the old storage format with repository, branch and checkout bookkeeping separated), I see the following:

metadir knit
Size 9.9MB 5.5MB
Number of files 1267 307

The reason for the smaller number of files is that information about all revisions in the repository is now stored together rather than in separate files. So the file count comes out at a constant plus 2 times the number of tracked files (a knit index file plus the knit data file). For comparison, the CVS repository I imported this from was 4.4MB, and comprised 143 files.

As well as reducing storage requirements, the new knit repository format is designed to reduce network traffic. With the current weave repository format, the weave file for each file touched by a commit gets rewritten to include the contents of the new revision. In contrast to this, the information about the new revision can simply be appended to the knit data file and the knit index file updated to match. This means publishing a branch to a server via sftp mainly involves append operations, resulting in a nice speed up.

Similarly when pulling new changes from a published branch, bzr only needs to download a knit index to find out which sections of the knit data are missing locally. It can then ask for just the changed sections (by an HTTP range request or a partial read with sftp), rather than downloading the entire contents of the changed weaves.

Overall, this should make bzr 0.8 a lot more usable than 0.7 for various network operations.

Repositories in Bzr

One of the new features comming up in the next release of bzr is support for shared repositories. This provides a way to reduce disk space needed to store multiple related branches. To understand how repositories work, it helps to know a bit about how branches are stored by bzr.

[bzr repository diagram]

There are three concepts that make up a bzr branch:

  1. A checkout or working tree. This is the source files you are working with. It represents the state of the source code at some recorded revision plus any local changes you’ve made. In the diagram on the right, it is represented as the red node.
  2. The branch, consisting of a linear sequence of revisions. This is represented by the blue nodes in the diagram. Note that there may be multiple paths from the first revision to the current revision due to branching and merging. The branch revision history indicates the path that was taken by this particular branch.
  3. The repository, being a store of the text of all the revisions in the ancestry of the branch, plus metadata about those revisions. This essentially stores information about every node and edge in the diagram.

In previous versions of bzr, this information was not clearly separated. However with the new default branch format in bzr 0.8 they are separated, and a particular directory need not contain all three parts, which is what makes the space savings and performance improvements possible.

One of the biggest space savings is achieved from sharing the repository data between branches. If a particular branch does not contain any repository information, bzr will recursively check the parent directory til it finds a repository. If a collection branches share some of their history, then the single shared repository will be significantly smaller than the space used if each branch had its own repository data.

Another way to reduce disk usage is to create branches without checkouts. This is useful when publishing a branch, since people pulling or merging from that branch don’t use the checkout files.

Finally, it is possible to create a checkout which does not contain branch or repository data, instead containing a pointer to where that data is located. This is quite useful when combined with a central shared repository.

So how big is this space saving? When I converted JHBuild to bzr, the repository data totals to 10MB, the branch data totals 100KB and a checkout is 1.4MB.

So to publish a second branch without the use of shared repositories means another 10MB of storage (a bit more if I include a checkout at the published location). If I use shared repositories, the cost of the second branch is 100KB plus an amount proportional to the size of the changes I make on that branch. So for many projects, the cost of publishing another branch is lost in the noise.

intltool and po/LINGUAS

Rodney: my suggestions for intltool were not intended as an attack. I just don’t really see much benefit in intltool providing its own po/ file.

The primary difference between the intltool po/ and the version provided by gettext or glib is that it calls intltool-update rather than xgettext to update the PO template, so that strings get correctly extracted from files types like desktop entries, Bonobo component registration files, or various other XML files.

The current method intltool uses to get intltool-update called (providing its own po/ is a lot better than the previous method (maintaining patches for the po/ files from various versions of gettext and then deciding which one to apply), however it can make it difficult to take advantage of new gettext features (the po/LINGUAS file being the most recent example). If it was possible for intltool-update to be called without any modification to the po/ file that gettext installs then this sort of problem wouldn’t occur.

The standard po/ uses the makefile variable $(XGETTEXT) as the program to extract translations for the PO template. If intltool had a program (or a mode for one of the existing programs) that was command line argument compatible with xgettext, then all that would be necessary would be to redefine $(XGETTEXT) to the appropriate value. Since $(XGETTEXT) is set through a simple autoconf substitution, this should be very easy to do from intltool’s M4 autoconf macro.


One issue that was meantioned as a Gnome Goal was to switch packages to use a po/LINGUAS file.

The idea makes sense — translators only need to edit a simple text file to add a new translation to an application, rather than having to modify the file without breaking things. Unfortunately, the suggested way of supporting this is a pretty big hack. A better long term solution would be to use the upstream gettext macros and po/ infrastructure.

For a Gnome module that doesn’t use intltool, the following steps should work.

  1. Make sure the module is being built with Automake 1.8 or 1.9. If it isn’t, upgrade to 1.9.
  2. Create an m4 subdirectory in your project if it doesn’t exist, add it in CVS and then create and add a m4/.cvsignore file (there are a number of files that will get created here by gettext that you don’t want to check into CVS).
  3. Mark the m4 subdirectory as the macro dir in the file:

    And make sure that the macro dir gets checked if the makefile reruns aclocal:

    AC_SUBST([ACLOCAL_AMFLAGS], ["-I $ac_macro_dir \${ACLOCAL_FLAGS}"])
  4. If you aren’t using the gnome-common script, you will also need to make sure that aclocal is called with “-I m4“. If you are using the gnome-common script, then this will happen automatically.
  5. Remove the AM_GLIB_GNU_GETTEXT call from and replace it with:
  6. If you aren’t using the gnome-common script, change the call to glib-gettextize to autopoint, and make sure it gets run before aclocal (again, unneeded if you are using the gnome-common script).
  7. Now rerun so that autopoint gets run. This should result in a number of files getting created under m4, and some new files under po.
  8. Copy po/Makevars.template to po/Makevars and customise the variables. You might want to set DOMAIN to $(GETTEXT_PACKAGE) rather than $(PACKAGE). Add this new file in CVS.
  9. Update po/LINGUAS from the ALL_LINGUAS variable in, and then remove the ALL_LINGUAS definition. Add po/LINGUAS to CVS.
  10. Finally update m4/.cvsignore and po/.cvsignore to ignore the new generated files.

As I said at the start, this change is only appropriate for apps not using intltool, since intltool overwrites the po/ file with an incomaptible version.

To get things working with intltool, I believe it would make most sense to modify intltool as follows:

  • Make intltool provide some commands that are command line argument compatible with xgettext and msgmerge.
  • Make IT_PROG_INTLTOOL alter XGETTEXT and MSGMERGE with the appropriate intltool functions.
  • Don’t overwrite po/
  • If additional makefile rules are needed in the po subdirectory, install a po/Rules-intltool file containing them. The gettext M4 macros will include them into the resulting Makefile.


I’ve been testing out Ekiga recently, and so far the experience has been a bit hit and miss.

  • Firewall traversal has been unreliable. Some numbers (like the SIPPhone echo test) work great. In some cases, no traffic has gotten through (where both parties were behind Linux firewalls). In other cases, voice gets through in one direction but not the other. Robert Collins has some instructions on setting up siproxd which might solve all this though, so I’ll have to try that.
  • The default display for the main window is a URI entry box and a dial pad. It would make much more sense to display the user’s list of contacts here instead (which are currently in a separate window). I rarely enter phone numbers on my mobile phone, instead using the address book. I expect that most VoIP users would be the same, provided that using the address book is convenient.
  • Related to the previous point: the registration service seems to know who is online and who is not. It would be nice if this information could be displayed next to the contacts.
  • Ekiga supports multiple sound cards. It was a simple matter of selecting “Logitech USB Headset” as the input and output device on the audio devices page of the preferences to get it to use my headset. Now I hear the ring on my desktop’s speakers, but can use the headset for calls.
  • It is cool that Ekiga supports video calls, but I have no video camera on my computer. Even though I disabled video support in the preferences, there is still a lot of knobs and whistles in the UI related to video.

Even though there are still a few warts, Ekiga shows a lot of promise. As more organisations provide SIP gateways become available (such as the UWA gateway), this software will become more important as a way of avoiding expensive phone charges as well as a way of talking to friends/colleagues.

Firefox Ligature Bug Followup

Thought I’d post a followup on my previous post since it generated a bit of interest. First a quick summary:

  • It is not an Ubuntu Dapper specific bug. With the appropriate combination of fonts and pango versions, it will exhibit itself on other Pango-enabled Firefox builds (it was verified on the Fedora build too).
  • It is not a DejaVu bug, although it is one of the few fonts to exhibit the problem. The simple fact is that not many fonts provide ligature glyphs and include the required OpenType tables for them to be used.
  • It isn’t a Pango bug. The ligatures are handled correctly in normal GTK applications on Dapper. The bug only occurs with Pango >= 1.12, but that is because older versions did not make use of the OpenType tables in the “basic” shaper (used for latin scripts like english).
  • The bug only occurs in the Pango backend, but then the non-Pango renderer doesn’t even support ligatures. Furthermore, there are a number of languages that can’t be displayed correctly with the non-Pango renderer so it is not very appealing.

The firefox bug is only triggered in the slow, manual glyph positioning code path of the text renderer. This only gets invoked if you have non-default letter or word spacing (such as justified text). In this mode, the width of the normal glyph of the first character in the ligature seems to be used for positioning which results in the overlapping text.

It seems that the bug may be fixed in the Firefox 1.6 series, but if that fix can’t be backported easily in time for Dapper, it might be easier to switch to a different default font that doesn’t contain the ligatures (such as Bitstream Vera). That would certainly reduce the chance of the bug occurring.

Annoying Firefox Bug

Ran into an annoying Firefox bug after upgrading to Ubuntu Dapper. It seems to affect rendering of ligatures.

At this point, I am not sure if it is an Ubuntu specific bug. The current conditions I know of to trigger the bug are:

  • Firefox 1.5 (I am using the 1.5.dfsg+ package).
  • Pango rendering enabled (the default for Ubuntu).
  • The web page must use a font that contains ligatures and use those ligatures. Since the “DejaVu Sans” includes ligatures and is the default “sans serif” font in Dapper, this is true for a lot of websites.
  • The text must be justified (e.g. use the “text-align: justify” CSS rule).

If you view a site where these conditions are met with an affected Firefox build, you will see the bug: ligature glyphs will be used to render character sequences like “ffi“, but only the advance of the first character’s normal glyph is used before drawing the next glyph. This results in overlapping glyphs

It also results in a weird effect when selecting text, since the ligatures get broken appart if the selection begins or ends in the middle of the ligature, causing the text to jump around.

I wonder if this bug affects the Firefox packages in any other distributions, or is an Ubuntu only problem?