Archive for the ‘Glick’ Category

Adventures in Docker land

Tuesday, October 15th, 2013

Connoisseurs of this blog know that I have an interest in application deployment systems, having created three different application bunding system (1,2, 3). These were all experiments in the area of desktop applications, but recently there has been some interesting motions in related areas, namely Docker.

Docker is a server application/container deployment system, which nicely sidesteps a lot of the complexity with desktop apps (not having to integrate deeply with the desktop) which makes it a lot easier to deploy. Additionally, docker is more than a deployment system, it also has some interesting ideas about how to create and distribute applications.

Every docker container is a copy-on-write clone of a specific parent Image, which means instantiation of docker containers is very fast and cheap. It also gives some very interesting properties because you can track the changes in a container (compared to t its parent image) and “commit” this to a new image. This creates a git-like hierarchy of images where every commit is a filesystem layer that applies to a previous layer, up to some base image. The git-like workflow is really nice to work with when creating images, and the final result is very easy to share and deploy, and at the same time automatically shares as much as possible with common base images.

Unfortunately Docker relies on AUFS, a union filesystem that is not in the upstream kernel, nor is it likely to ever be there. Also, while AUFS is in the current Ubuntu kernel it is deprecated there and will eventually be removed. This means Docker doesn’t run on Fedora which has a primarily-upstream approach to packaging.

So, the last month or so I’ve been working on making Docker work in Fedora (and thus eventually in RHEL, which is the nr 1 requested Docker feature). Of course, this work will benefit other distributions that don’t have AUFS too.

I started looking at possible replacements for the copy-on-write support, and there are a few possibilities availible:

  • overlayfs
  • btrfs
  • lvm snapshots
  • lvm thin provisioning

Overlayfs is a different union filesystem implementation than AUFS, and the one that seems most likely to land upstream. But that is happening slowly, if at all. Long-term I think this is the best option, but right now it is out of the question.

Btrfs has copy-on-write both using filesystem snapshots and one a file basis using reflink. However, btrfs is not currently used much in production as its not considered stable enough. It would also be a very heavy dependency for Docker, as may users would have to reformat their disks to use it.

Lvm snapshots are useful for doing e.g. backup of a snapshot, but regress badly in performance when you start having many snapshots of the same device.

This leaves us with only lvm thin provisioning. This is a fairly recent, but relatively stable technology that allows you to create copy-on-write block devices that are “thinly” provisioned, meaning they don’t use real space until the device is in use. This is not ideal for Docker as it really wants copy-on-write at the file level, but with some work it is possible to work around this.

Rather than interacting with lvm which is a very generic volume manager I chose to use the lower level device-mapper kernel APIs directly (via libdevmapper). This allows us greater ease of access to the devices programmatically, as well as avoiding confusion with possible system use of LVM. Also it avoids some LVM performance issues with very many devices.

So, we set up a single large block device on which we create a device-mapper “thinp” pool. On this we then creates a single “base” block device formated with ext4. Every image and container are then created as snapshot (in multiple steps) from this base device. So, say you’re starting a container based on an image “apache” which itself is  based on a “fedora” image, we would:

  1. Create a snapshot of the base device.
  2. Mount it and apply the changes in the fedora image.
  3. Create a snapshot based on the fedora device.
  4. Mount it and apply the changes in the apache image.
  5. Create a snapshot based on the apache device.
  6. Mount it and use as the root in the new container.

And of course, these devices will be reused (with corresponding steps skipped) as needed by other images/containers.

The devicemapper pool need to be set up on a large block device that fits all the images and containers that you will be used which would be painful for most people. Docker handles this by automatically creating the a large sparse file, using it as a loopback device for the devicemapper work. Additionally we ensure that DISCARD support is enabled in the filesystem so that any files removed in the conttainer filters down to the loopback file making it sparse again.

This means that there is no need for setup, and space for images and containers will only be used as needed. Of course, there are still issues, like the max size of the loopback mount (100G by default, but this should be easy to grow) and the max size of the base extt4 image (10G by default, resizing is harder after initial construction, but should be possible).

We’re currently in the process of landing this in Docker, and hope to have a 0.7 release out based on my device-mapper work pretty soon. Then I will continue working on making docker a first-class citizen on Fedora.

Glick2 code availible

Wednesday, October 12th, 2011

I spent some time this week cleaning up and polishing the glick2 codebase. It is now available on the net here.

I also changed the checksumming scheme a bit from my last post, now each file in a bundle is indexed by sha1 checksum, so we can share any files that are identical between two bundles in use.

There is no release yet, and very little documentation on how to use it, but interested parties can play with the code.

Rethinking the Linux distibution

Friday, September 30th, 2011

Recently I’ve been thinking about how Linux desktop distributions work, and how applications are deployed. I have some ideas for how this could work in a completely different way.

I want to start with a small screencast showing how bundles work for an end user before getting into the technical details:

YouTube Preview Image

Note how easy it is to download and install apps? Thats just one of the benefits of bundles. But before we start with bundles I want to take a step back and look at what the problem is with the current Linux distribution models.

Desktop distributions like Fedora or Ubuntu work remarkably well, and have a lot of applications packaged. However, they are not as reliable as you would like. Most Linux users have experienced some package update that broke their system, or made their app stop working. Typically this happens at the worst times. Linux users quickly learn to disable upgrades before leaving for some important presentation or meeting.

Its easy to blame this on lack of testing and too many updates, but I think there are some deeper issues here that affect testability in general:

  • Every package installs into a single large “system” where everything interacts in unpredictable ways. For example, upgrading a library to fix one app might affect other applications.
  • Everyone is running a different set of bits:
    • The package set for each user is different, and per the above all packages interact which can cause problems
    • Package installation modify the system at runtime, including running scripts on the users machine. This can give different results due to different package set, install order, hardware, etc.

Also, while it is very easy to install the latest packaged version of an application, other things are not so easy:

  • Installing applications not packaged for your distribution
  • Installing a newer version of an application that requires newer dependencies than what is in your current repositories
  • Keeping multiple versions of the same app installed
  • Keeping older versions of applications running as you update your overall system

So, how can we make this better? First we make everyone run the same bits. (Note: From here we start to get pretty technical)

I imagine a system where the OS is a well defined set of non-optional core libraries, services and apps. The OS is shipped as a read-only image that gets loopback mounted at / during early boot. So, not only does everyone have the same files, they are using (and testing) *exactly* the same bits. We can do semi-regular updates by replacing the image (keeping the old one for easy rollback), and we can do security hot-fixes by bind-mounting over individual files.

The core OS is separated into two distinct parts. Lets call it the platform and the desktop. The platform is a small set of highly ABI stable and reliable core packages. It would have things like libc, coreutils, libz, libX11, libGL, dbus, libpng, Gtk+, Qt, and bash. Enough unix to run typical scripts and some core libraries that are supportable and that lots of apps need.

The desktop part is a runtime that lets you work with the computer. It has the services needed to be able to start and log into a desktop UI, including things like login manager, window manager, desktop shell, and the core desktop utilities. By necessity there will some libraries needed in the desktop that are not in the platform, these are considered internal details and we don’t ship with header files for them or support third party binaries using them.

Secondly, we untangle the application interactions.

All applications are shipped as bundles, single files that contain everything (libraries, files, tools, etc) the application depends on. Except they can (optionally) depend on things from the OS platform. Bundles are self-contained, so they don’t interact with other bundles that are installed. This means that if a bundle works once it will always keep working, as long as the platform is ABI stable as guaranteed. Running new apps is as easy as downloading and clicking a file. Installing them is as easy as dropping them in a known directory.

I’ve started writing a new bundle system, called Glick 2, replacing an old system I did called Glick. Here is how the core works:

When a bundle is started, it creates a new mount namespace, a kernel feature that lets different processes see different sets of mounts. Then the bundle file itself is mounted as a fuse filesystem in a well known prefix, say /opt/bundle. This mount is only visible to the bundle process and its children. Then an executable from the bundle is started, which is compiled to read all its data and libraries from /opt/bundle. Another kernel feature called shared subtrees is used to make the new mount namespace share all non-bundle mounts in the system, so that if a USB stick is inserted after the bundle is started it will still be visible in the bundle.

There are some problematic aspects of bundles:

  • Its a lot of work to create a bundle, as you have to build all the dependencies of your app yourself
  • Shared libraries used by several apps are not shared, leading to higher memory use and more disk i/o
  • Its hard for bundles to interact with the system, for instance to expose icons and desktop files to the desktop, or add a new mimetype

In Glick 2, all bundles are composed of a set of slices. When the bundle is mounted we see the union of all the slices as the file tree, but in the file itself they are distinct bits of data. When creating a bundle you build just your application, and then pick existing library bundles for the dependencies and combine them into an final application  bundle that the user sees.

With this approach one can easily imagine a whole echo-system of library bundles for free software, maintained similarly to distro repositories (ideally maintained by upstream). This way it becomes pretty easy to package applications in bundles.

Additionally, with a set of shared slices like this used by applications it becomes increasingly likely that an up-to-date set of apps will be using the same build of some of its dependencies. Glick 2 takes advantage of this by using a checksum of each slice, and keeping track of all the slices in use globally on the desktop. If any two bundles use the same slice, only one copy of the slice on disk will be used, and the files in the two bundle mount mounts will use the same inode. This means we read the data from disk only once, and that we share the memory for the library in the page cache. In other words, they work like traditional shared libraries.

Interaction with the system is handled by allowing bundle installation. This really just means dropping the bundle file in a known directory, like ~/Apps or some system directory. The session then tracks files being added to this directory, and whenever a bundle is added we look at it for slices marked as exported. All the exported slices of all the installed bundles are then made visible in a desktop-wide instance of /opt/bundle (and to process-private instances).

This means that bundles can mark things like desktop files, icons, dbus service files, mimetypes, etc as exported and have them globally visible (so that other apps and the desktop can see them). Additionally we expose symlinks to the intstalled bundles themselves in a well known location like /opt/bundle/.bundles/<bundle-id> so that e.g. the desktop file can reference the application binary in an absolute fashion.

There is nothing that prohibits bundles from running in regular distributions too, as long as the base set of platform dependencies are installed, via for instance distro metapackages. So, bundles can also be used as a way to create binaries for cross-distro deployment.

The current codebase is of prototype quality. It works, but requires some handholding, and lacks some features I want. I hope to clean it up and publish it in the near future.

Glick 0.2 released

Thursday, August 23rd, 2007

There was two really embarrasing bugs in the 0.1 release. First of all the argument order was switched around in the description of how to create glicks in the README. Secondly, a bug in the fuse filesystem implementation made it hang after 10 files had been opened.

So, a quick release is in order. Get your fresh new code at the glick home page.

Thanks to  Stefan Westerfeld for finding these issues.

Glick 0.1 released

Tuesday, August 21st, 2007

I’m back from my vacation now, and instead of spending days reading through the backlog of emails I decided to polish up glick and make a release so that people can play with it.

One problem that the initial glick version had was that all file lookups were done via the /tmp/glick_root symlink. This symlink being in /tmp and possibly being owned by someone else is a security issue. So, in the new release we instead use “/proc/self/fd/1023″ as the absolute prefix for the glick mount. While this looks a bit strange it is much more secure. However, it does make it a bit more complicated to create and test glick bundles, so glick now ships with the “glick-shell” tool that lets you point to a working directory and make /proc/self/fd/1023 point to that.

The new release also contains an easy to use script “glick-mkext2″  that creates minimally sized ext2 images from a directory, in addition to the mkglick script that creates the actual glick.

I’ve also added –icon and –desktop-file switches to mkglick that lets you embed 48×48 png icons and desktop files into the ELF file. These are stored in the “.xdg.icon.48″ and “.xdg.desktop” sections and can be easily extracted (using e.g. objdump, or some simple ELF header parsing code). In fact, glick now also ships with a tool “glick-extract” that lets you extract the filesystem, icon and desktop file parts easily.

I’ve talked to some people about the GPL licensing issue discussed in my previous entry, and I’ve come to believe that its ok to distribute glick bundles that contain non-GPL programs even though the glick code is GPL. Distributing such a bundle is really no different from distributing an iso file with both GPL and non-GPL software, which is explicitly ok due to the aggregation section in the GPL:

In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.

Of course, I am not a lawyer, so seek your own legal advice if you are unsure.

Anyway, this version of glick should be secure, easy to use and useful, please play with it and do amazing things. The source for the release and Fedora 7 rpms are availible from the glick webpage.