Rethinking the Linux distibution

Recently I’ve been thinking about how Linux desktop distributions work, and how applications are deployed. I have some ideas for how this could work in a completely different way.

I want to start with a small screencast showing how bundles work for an end user before getting into the technical details:

[youtube]http://www.youtube.com/watch?v=qpRjSAD_3wU[/youtube]

Note how easy it is to download and install apps? Thats just one of the benefits of bundles. But before we start with bundles I want to take a step back and look at what the problem is with the current Linux distribution models.

Desktop distributions like Fedora or Ubuntu work remarkably well, and have a lot of applications packaged. However, they are not as reliable as you would like. Most Linux users have experienced some package update that broke their system, or made their app stop working. Typically this happens at the worst times. Linux users quickly learn to disable upgrades before leaving for some important presentation or meeting.

Its easy to blame this on lack of testing and too many updates, but I think there are some deeper issues here that affect testability in general:

  • Every package installs into a single large “system” where everything interacts in unpredictable ways. For example, upgrading a library to fix one app might affect other applications.
  • Everyone is running a different set of bits:
    • The package set for each user is different, and per the above all packages interact which can cause problems
    • Package installation modify the system at runtime, including running scripts on the users machine. This can give different results due to different package set, install order, hardware, etc.

Also, while it is very easy to install the latest packaged version of an application, other things are not so easy:

  • Installing applications not packaged for your distribution
  • Installing a newer version of an application that requires newer dependencies than what is in your current repositories
  • Keeping multiple versions of the same app installed
  • Keeping older versions of applications running as you update your overall system

So, how can we make this better? First we make everyone run the same bits. (Note: From here we start to get pretty technical)

I imagine a system where the OS is a well defined set of non-optional core libraries, services and apps. The OS is shipped as a read-only image that gets loopback mounted at / during early boot. So, not only does everyone have the same files, they are using (and testing) *exactly* the same bits. We can do semi-regular updates by replacing the image (keeping the old one for easy rollback), and we can do security hot-fixes by bind-mounting over individual files.

The core OS is separated into two distinct parts. Lets call it the platform and the desktop. The platform is a small set of highly ABI stable and reliable core packages. It would have things like libc, coreutils, libz, libX11, libGL, dbus, libpng, Gtk+, Qt, and bash. Enough unix to run typical scripts and some core libraries that are supportable and that lots of apps need.

The desktop part is a runtime that lets you work with the computer. It has the services needed to be able to start and log into a desktop UI, including things like login manager, window manager, desktop shell, and the core desktop utilities. By necessity there will some libraries needed in the desktop that are not in the platform, these are considered internal details and we don’t ship with header files for them or support third party binaries using them.

Secondly, we untangle the application interactions.

All applications are shipped as bundles, single files that contain everything (libraries, files, tools, etc) the application depends on. Except they can (optionally) depend on things from the OS platform. Bundles are self-contained, so they don’t interact with other bundles that are installed. This means that if a bundle works once it will always keep working, as long as the platform is ABI stable as guaranteed. Running new apps is as easy as downloading and clicking a file. Installing them is as easy as dropping them in a known directory.

I’ve started writing a new bundle system, called Glick 2, replacing an old system I did called Glick. Here is how the core works:

When a bundle is started, it creates a new mount namespace, a kernel feature that lets different processes see different sets of mounts. Then the bundle file itself is mounted as a fuse filesystem in a well known prefix, say /opt/bundle. This mount is only visible to the bundle process and its children. Then an executable from the bundle is started, which is compiled to read all its data and libraries from /opt/bundle. Another kernel feature called shared subtrees is used to make the new mount namespace share all non-bundle mounts in the system, so that if a USB stick is inserted after the bundle is started it will still be visible in the bundle.

There are some problematic aspects of bundles:

  • Its a lot of work to create a bundle, as you have to build all the dependencies of your app yourself
  • Shared libraries used by several apps are not shared, leading to higher memory use and more disk i/o
  • Its hard for bundles to interact with the system, for instance to expose icons and desktop files to the desktop, or add a new mimetype

In Glick 2, all bundles are composed of a set of slices. When the bundle is mounted we see the union of all the slices as the file tree, but in the file itself they are distinct bits of data. When creating a bundle you build just your application, and then pick existing library bundles for the dependencies and combine them into an final application  bundle that the user sees.

With this approach one can easily imagine a whole echo-system of library bundles for free software, maintained similarly to distro repositories (ideally maintained by upstream). This way it becomes pretty easy to package applications in bundles.

Additionally, with a set of shared slices like this used by applications it becomes increasingly likely that an up-to-date set of apps will be using the same build of some of its dependencies. Glick 2 takes advantage of this by using a checksum of each slice, and keeping track of all the slices in use globally on the desktop. If any two bundles use the same slice, only one copy of the slice on disk will be used, and the files in the two bundle mount mounts will use the same inode. This means we read the data from disk only once, and that we share the memory for the library in the page cache. In other words, they work like traditional shared libraries.

Interaction with the system is handled by allowing bundle installation. This really just means dropping the bundle file in a known directory, like ~/Apps or some system directory. The session then tracks files being added to this directory, and whenever a bundle is added we look at it for slices marked as exported. All the exported slices of all the installed bundles are then made visible in a desktop-wide instance of /opt/bundle (and to process-private instances).

This means that bundles can mark things like desktop files, icons, dbus service files, mimetypes, etc as exported and have them globally visible (so that other apps and the desktop can see them). Additionally we expose symlinks to the intstalled bundles themselves in a well known location like /opt/bundle/.bundles/<bundle-id> so that e.g. the desktop file can reference the application binary in an absolute fashion.

There is nothing that prohibits bundles from running in regular distributions too, as long as the base set of platform dependencies are installed, via for instance distro metapackages. So, bundles can also be used as a way to create binaries for cross-distro deployment.

The current codebase is of prototype quality. It works, but requires some handholding, and lacks some features I want. I hope to clean it up and publish it in the near future.

85 thoughts on “Rethinking the Linux distibution”

  1. I was around Debian when security bug in libz was found and I was helping to search for bundled versions of libz in all 15000+ packages. (replace libz with any non-platform library).

    Since then I really like non-bundling anal-retentivness and strict packaging practices.

  2. Bundles sound like an interesting solution…to a problem already solved very well on FOSS dekstops with pkg management. I’d prefer to see the effort put into improving those systems.

  3. Just track which libz that are installed in which bundles. Admin or system can choose to force upgrades or possibly override the bundles libz.

    The other way around forces everyone with status quo, which just isn’t good enough. Using software is a pain on linux. And you have to make hard choices when ‘hmm exam time, i should not upgrade, but that new feature could really help my productivity.’

    pkg systems are awesome at managing systems, but not so much upgrading to the newest app X without fear or forced distro upgrade.

  4. “Every package installs into a single large “system” where everything interacts in unpredictable ways. For example, upgrading a library to fix one app might affect other applications.”

    Well, that’s exactly what we want. That’s the point of a library: it’s shared code. So we only have to update it once to fix all the applications that use it.

    If we have to update 30 different bundles when we fix a library bug, what’s the point of having libraries at all?

    1. Adam: The main point of libraries still remains: when writing the application the library code didn’t have to be rewritten. Packaging effects is only a secondary effect of libraries.

      Sure, updating libraries in apps is more work in a bundle world, but to a large degree it is solvable by better tooling. Also, IMHO it *should* be harder to upgrade a library that an app uses. Right now we keep bumping libraries in all apps just because there is something newer out, not because said new version actually makes that particular app *better*. In fact, in many cases it introduces problems in some apps, or just doesn’t matter to a lot of apps.

      In my opinion the “large system where everything interacts” is optimized for the distro packager, not for actual users of applications. Its somehow more important that we have zero wasted bytes or wasted effort during packaging than the actual user getting an application that he can use.

  5. Many a times have I come across the limitations of a package manager, especially on systems where I was not admin, and the only way to install new software was to compile it from sources. However, it seems to me that the solution proposed by the 0install guys (http://0install.net/) offers more features, especially in that it allows managing updates centrally, even though the apps come from a variety of sources.

    PS: This is not a plug, and I am not affiliated with the 0install project in any way. I tried to use it at one point, but the lack of apps was an unsurmountable obstacle to its adoption.

  6. The trouble with bundling libraries with every app is lack of efficiency; I think it’s impractical until we can ensure that for multiple apps, the libs they need are loaded only once into memory. (I thought that was already possible, the kernel being smart about such things?) Then there is still the unfortunate duplication on disk, but that could maybe be solved by a de-duping filesystem eventually. So I guess that’s why you put things like Qt and GTK in the “platform” layer at least, so that they are not duplicated. It’s a compromise, and there can still be different apps which depend on different versions of those too, in theory.

    Another point is that on a conventional Linux system everyone can be a developer, and that’s an important use case, not to be neglected just for the sake of making app installation easier. Not that there’s anything wrong with omitting headers by default, but it should at least be quick and easy to install them, and being a developer should fit harmoniously with how the rest of the filesystem is laid out.

    Maybe you got some inspiration from MacOS; but there are pros and cons to do it the OSX way or the pre-X way. In either of those, it doesn’t matter where you install your apps, and that is a nice feature to have. So depending on putting apps in a known location and then depending on fuse hacks to make them run seems impractical too, and I don’t really see the point of it. We might at least start from the known-good ideas from MacOS and build or improve on them rather than letting it be more brittle than that. So I think a basic requirement is that if an app (a bundle or not) exists on any attached disk, the file manager/finder/desktop/launcher/whatever should somehow know that it’s available for the “open” or “open with…” scenario when you have found a compatible document that you’d like to open. That should not depend on any “installation” step, it should “just work”, even if the app itself is ephemeral (you plugged in a USB stick which has the app, and you are just going to use it right now, not “install” it). So that implies there needs to be a list of apps somewhere. The filesystem needs to have metadata about the executables residing on it, and that metadata needs to be kept up-to-date at all times (ideally not just on your pet distro, but unconditionally). When the FS is mounted, the list of apps on that FS is then merged into the master list (or else, when the desktop system wants to search for an app, it must search on all mounted FS’s). When the FS is unmounted, the app is no longer available to launch new instances.

    Coincidentally I was thinking last night that runtime linking could stand to be a little smarter, to tolerate mismatching versions whenever the functions the app needs from the lib are present and have the same arguments. E.g. the recent case I’ve seen when libpng was upgraded and stuff refused to run just because it was a different version, should not happen. But runtime linking is still a fuzzy area for me, I don’t understand why that breakage happens and why some other libs do a better job,.

    So in general I think I’d avoid breaking things that aren’t broken, and for what is broken, fix it at the lowest possible level, rather than just putting more layers on top of what is there. (Unfortunately in some of those areas there are few experts capable of fixing things, though.) Filesystems should be smarter, the use of metadata in them should be more widespread, the tools used to transfer files between filesystems and across the network should transfer the metadata at the same time, the linker should be smarter, and the shell should work the same way the graphical desktop does (that is something at which MacOS does not yet excel). File management and package management should be merged to become the same thing: if you install an app and it has dependencies you should get the dependencies at the same time, but if you already have them, then there is no need to bundle them and waste extra bandwidth and disk space re-getting the same libs again.

  7. I fully agree with you that the current distributions are optimized for the packagers and not for the users. Although I’d rather word it like: are more designed after the technical necessities than the ease of use.

    Still any approach ignoring those necessities will likely fail. While starting with the user’s view may lead to a better result the whole thing is worthless unless you are also able to describe how the software should get packaged and how your idea is not creating (significantly) more work for the packagers. The key here is IMHO how updates (are supposed to) work. You’ll need to consider at least the most common use cases and compare their costs on both the user’s and the packager’s side to the current “packages and repos” solution. Most obvious use cases include:

    * New version of application is available
    * Important fix for a library is available
    * Exploit for a library or application is out in the wild
    * Test newly created combination of application and library versions

    The other important and yet unanswered question is how the user can trust the software installed. This question is also linked to updates as the user needs some trust into the ability and willingness to provide updates in the future. But it also contains as simple questions as how to make sure that the software does not contain malware.

    I hope these question help you ironing out some of the flaws that your design might still have.

    Florian

  8. why reinventing the wheel? Use arch linux
    It has a core system, a repository for other packages and an aur area where everyone can publish ‘recipes’ for building packages.
    The packages are bleeding edge so there is often no need for using git-versions in aur or creating your own ‘recipes’.
    Compatibility issues are easy to solve with a partial downgrade until a fix is introduced.
    And multiple versions of programs can installed if you use another root.

  9. I must say that I don’t find the idea of bundles really appealing.
    And I don’t agree either with the fact that distros are here to ease the packagers’ work at the expense of the end user.

    Coherent distributions are totally suited for OpenSource stuff.
    Library versioning (done by the upstream developers) allows for distributions to keep different versions of the same libraries when different dependencies require different versions (on Gentoo they call it slots).
    Nevertheless, some library upgrades will cause dependency breakage; this is normal, yet mostly exceptional.
    Distributions handle the worst breakages by:
    – using a “dist-upgrade” system, for binary distros : it fits well with release cycles
    – “revdep-rebuild” for source distros
    Source-based distributions are at advantage because they can dare to do stuff like installing python/perl/… extensions for multiple interpreter versions at once.
    AFAICT the technical problems all have solutions, it must be that they’re not implemented to preserve existing infrastructures (distro limitations).

    While disk space is cheap, I don’t like the idea of having a number of copies of shared libraries proportional to the number of end packages.
    Usually, upstream focuses on 1 or two library versions, others are unmaintained.
    So you’d end up with having the same libs anyway, because you probably don’t want to use unmaintained software or to maintain it yourself (-> binary updates).

    IMHO bundles should remain exceptional, for the typical proprietary software (acroread, matlab, …) with a lot of deps and which can’t afford to support every setup out there.
    But for Free software, it does not help anybody.

    I don’t remember which one but I heard of an experimental distro which is bundle-based.
    Maybe you should try it and see what you think of it ?

    [WORDPRESS HASHCASH] The poster sent us ‘0 which is not a hashcash value.

  10. Very interesting post. The idea of app bundles on Linux has been tried many times, but so far no concept has really taken off (do Ubuntu or Fedora or Suse explicitly support any of the existing bundling systems?). Well, I hope one day an approach will be made which does take off, because there are some use cases where package managers don’t offer a solution yet, and where bundles might be much better suited.

    For that reason, is there any complete and honest comparison between pros and cons of package managers and bundles? I would imagine that future distributions might use a combination of both systems, but to find the right balance it would be important to identify the cases where bundles shine and package managers suck, and vice versa.

    Btw. it would be amazing if RHEL5 had some bundle system… It’s a real PITA having to compile all new software from source, including all required libraries… Funny thing is, even the existing bundling systems don’t quite work because the bundles usually require new freetype, new fontconfig, new libpng… rather than shipping these libs inside the bundles. Thinking of it, maybe RHEL5 is an interesting hard test case for the real-life viability of bundling systems 🙂

  11. So, instead of focusing on the proposed solution, this is how I actually see the problem. Libraries/files being replaced “underneath” running applications is the main issue. Secondary issue is the way third party apps can be installed.

    For the first issue, what we may want is for distributions to include something like filesystem snapshotting before any software upgrade (in default installations), and continuing to use that snapshot while upgrades are in progress and apps are not restarted. This approach would not harm security of the system as long as you restart affected applications (but that’s not forced today either).

    Static compiling has been very popular with application vendors for quite some time, but not so much in the recent past because it seems they have gotten better at packaging, or packaging has gotten easier. Bundles are basically the same thing, so I don’t see a win there. Improving packaging tools to enable easier parallel installs would also help here for when some application needs a more recent version of a library.

    Bundles are repeatedly tried over and over again, and the benefits never seem to outweight the drawbacks. Also, there is ROX Desktop “Zero Install” approach as well: did you have a chance to look at that?

    1. Danilo: Files being replaced underneath running apps is not the only problem. I list several others.

      Also, snapshotting like that is problematic. It can ensure that nothing sees the update until its fully baked, yes, but that only delays the problem. At some point you replace under running applications, or you have to restart all applications.

      Static linking has many problems that dynamic linking solves, so, even if you’re bundling you do want to use dynamic linking.

  12. But just imagine someone packaging a picture viewer and is also including a library parsing image files. A user is installing this. Then a exploit comes out for the library. The library is fixed – but the picture viewer is still using the old version if it is not repackeged!
    So I just have to say that I don’t like the approach.

  13. This is really great, but what happened to the idea of integrating policykit in gvfs? If you want to install a program on the whole system this way you’ll have to get root privileges, which is best implemented with policykit. Is the GNOME-project afraid that too many users will screw up their system?

  14. This is a very good idea!
    Very good for making Linux more appealing to noise or less savvy users.

    Like many has already mentioned; dependency management, etc are areas of concern. I think it should really plug back into the distributions package management. The bundle really should only be like a portable version of the application. Run it, have go; if you like it, copy the file into the Apps folder or right click and select install, which will hint the package manager to install the bundle from repo ( if available ) or take care of dependencies, as much as possible.

    My real concern; those who would get attracted by such simplicity might not be savvy enough to understand security issues, dependency, etc. That is one of the reasons why I would prefer it tying into the package management.

    The key is the amount of meta information that can be packed into the bundle to make it work nicely with the package manager.

  15. Things usually only break badly when the breakage is in the “platform” part, which is when your approach doesn’t help at all.

    Plus, these kind of problems tend to only surface over time and scale. With a dozen packages or bundles, everything will be fine. Once you have some 10000 packages, there will always be something broken there; not necessarily because the underlying system is bad, but just because users and developers make errors.

    There are reasons why autopackage for example failed. AFAIK it seems pretty similar to your glick.

    Let me give you another side of the story. Think about security. A library, say, libpng, libz (because they are historical good examples) or Lua (because it is often embedded even as source code) has a security error. With a traditional linux distribution, such a thing is easy to fix. With Glick, I need to check all my bundles and fix all of them. Whoa. Maintainance hell.

    [WORDPRESS HASHCASH] The poster sent us ‘0 which is not a hashcash value.

  16. Not sure about the scalability of the concept, and some of the core concepts are good enough to see doubtful faces. Still love to check when its ready

    [WORDPRESS HASHCASH] The poster sent us ‘0 which is not a hashcash value.

  17. What you’re thinking of makes sense, however bundle packages have their own drawbacks. I suggest you to read a bit how NixOS[1] and especially it’s package manager works.

    [1] http://nixos.org/
    [2] http://nixos.org/nix/

    Currently there are not many packages but you can do all you wish with it.
    – downgrades are easily done
    – packages can be installed by normal users
    – you can have multiple version of the same library or the same program without using static linking.
    – and much more…

  18. NixOS is really great idea! However this approach (and it’s support in GTK/Glib) may be usable to simplify life of maintainers for Win/Mac platforms.

Leave a Reply to Kai Cancel reply

Your email address will not be published. Required fields are marked *