First fully sandboxed Linux desktop app

Its not a secret that I’ve been working on sandboxed desktop applications recently. In fact, I recently gave a talk at devconf.cz about it. However, up until now I’ve mainly been focusing on the bundling and deployment aspects of the problem. I’ve been running applications in their own environment, but having pretty open access to the system.

Now that the basics are working it’s time to start looking at how to create a real sandbox. This is going to require a lot of changes to the Linux stack. For instance, we have to use Wayland instead of X11, because X11 is impossible to secure. We also need to use kdbus to allow desktop integration that is properly filtered at the kernel level.

Recently Wayland has made some pretty big strides though, and we now have working Wayland sessions in Fedora 21. This means we can start testing real sandboxing for simple applications. To get something running I chose to focus on a game, because they require very little interaction with the system. Here is a video I made of Neverball, running in a minimal sandbox:

In this example we’re running a regular build of neverball in an environment which:

  • Is independent of the host distribution
  • Has no access to any system or user files other than the ones from the runtime and application itself
  • Has no access to any hardware devices, other than DRI (for GL rendering)
  • Has no network access
  • Can’t see any other processes in the system
  • Can only get input via Wayland
  • Can only show graphics via Wayland
  • Can only output audio via PulseAudio
  • … plus more sandboxing details

Yet the application is still simple to install and integrates nicely with the desktop. If you want to test it yourself, just follow the instructions on the project page and install org.neverball.Neverball.

Of course, there are still a lot to do here. For instance, PulseAudio doesn’t protect clients from each other, and for more complex applications we need to add new APIs to safely grant access to things like user files and devices. The sandbox details page has a more detailed list of what has to be done.

The road is long, but at least we have now started our journey!

Back from DevX hackfest

I’m now back from a week in Cambridge at the developer experience hackfest. This was a great event, it was a lot of fun to meet people again, and we got a lot of things done. I spent a lot of time talking to people about things related to xdg-app and sandboxed applications, both spreading information and actually implementing features.

I spent some time with Emmanuele, Ryan and Lars working on glib stuff, which resulted in the G_DECLARE_*_TYPE macros finally being merged. Additionally I reviewed the new list model abstraction which I hope we can land soon, and Ryan and I worked out a new fancy __attribute__(cleanup) approach that we hope to merge into glib soon.

We also worked a bit on Gtk+ OpenGL support. Based on feedback from early users we’re doing some changes in how GL contexts are created to allow you to configure them in more detail. We also decided that we want to completely drop support for legacy OpenGL contexts, as these had issues cooperating with Core 3.2 contexts, and because we don’t live in the 90s anymore. Carlos was working on converting GtkPopover to use (override redirect) toplevels on X11, and I gave him moral support and generally hated on ancient crappy X11 behaviour.

Props to Collabora and Philip for arranging a great event!

Introducing Gthree

I’ve recently been working on OpenGL support in Gtk+, and last week it landed in master. However, the demos we have are pretty lame and are not very good to show off or even test the OpenGL support. I’ve looked around for some open source demos that used modern GL that we could use, but I didn’t find anything that we could easily use.

What I did find though, was a lot of WebGL demos that used three.js. This looked like a very nice open source library for highlevel 3d rendering. At first I had some plans to bind OpenGL to gjs so that we could run three.js, but this turned out to be a hard.

Instead I started converting three.js into C + GObject, using the Gtk+ OpenGL support and the vector/matrix library graphene that Emmanuele has been working on recently.

After about a week of frantic hacking it is now at a stage where it may be interesting for others. So, without further ado I introduce:

https://github.com/alexlarsson/gthree

It does not yet support everything that three.js can do, but it does support a meshes with most mesh matrial types and lighting, including a loader for the json model format of thee.js, which means that it is minimally useful.

Here are some screenshots of the examples that ships with the code:

Screenshot from 2014-10-24 15:04:47
Various types of materials
Screenshot from 2014-10-24 15:10:00
Some sample models from three.js examples
Screenshot from 2014-10-24 15:31:40
Some random cubes

This has been a lot of fun to work on as I’ve seen a lot of progress very fast. Mad props to mrdoob and the other three.js developers for creating three.js and making it free software. Gthree is a huge rip-off of their work and would never be possible without it. Thanks also to Emmanuele for his graphene library.

What are you sitting here for, go ahead and play with it! Make some demos, port some more three.js features, marvel at the fancy graphics!

Building clean docker images

I’ve been doing some experiments recently with ways to build docker images. As a test bed for this I’m building an image with the Gtk+ Broadway backend, and some apps. This will both be useful (have a simple way to show off Broadway) as well as being a complex real-world use case.

I have two goals for the image. First of all, building of custom code has to be repeatable and controlled, Secondly, the produced image should be clean and have a minimal size. In particular, we don’t want any of the build dependencies or intermediate results in any layer in the final docker image.

The approach I’m using involves using a build Dockerfile, which installs all the build tools and the development dependencies. Inside this container I build a set of rpms. Then another Dockerfile creates a runtime image which just installs the necessary runtime rpms from the rpms built in the first container..

However, there is a complication to this. The runtime image is created using a Dockerfile, but there is no way for a Dockerfile to access the files produced from the build container, as volumes don’t work during docker build. We could extract the rpms from the container and use ADD in the dockerfile to insert the files into the container before installing them, but this would create an extra layer in the runtime image that contained the rpms, making it unnecessarily large.

To solve this I made the build container construct a yum repository of all the built rpms, and then when starts lighttp, exporting it. So, by doing:

docker build -t broadway-repo
docker run -d -p 9999:80 broadway-repo

I get a yum repository server at port 9999 that can be used by the
second container. Ideally we would want to use
docker links to get access to the build container, but unfortunately links don’t work with docker build. Instead we use a hack in the Makefile to
generate a yum repo file with the host ip address and add that to the
container.

Here is the Dockerfile used for the build container. There are are some interesting things to notice in it:

  • Instead of starting by adding all the source files we add them one by one as needed. This allows us to use the caching that docker build does, so that if you change something at the end of the Dockerfile docker will reuse the previously built images all the way up to the first line that has changed.
  • We try do do multiple commands in each RUN invocation, chaining them together with &&. This way we can avoid the current limit of 127 layers in a docker image.
  • At the end we use createrepo to create the repository and set the default command to run lighttp on the repository at port 80.
  • We rebuild all packages in the Gtk+ stack, so there are no X11 dependencies in the final image. Additionally, we strip a bunch of dependencies and use some other tricks to make the rpms themselves a bit smaller.

Here is the Dockerfile for the runtime container. It installs the required rpms, creates a user and sets up an init script and a super-simple panel/launcher app written in javascript. We also use a simple init as pid 1 in the container so that process that die are reaped.

Here is a screenshot of the final results:

Broadway

You can try it yourself by running “docker run -d -p 8080:8080 alexl/broadway” and loading http://127.0.0.1:8080 in a browser.

The modern Gtk drawing model

Over a few releases now, and culminating in Gtk+ 3.10 we have restructured the way drawing works internally. This hasn’t really been written about a lot, and I still see a lot of people unaware of these changes, so I thought I’d do a writeup of the brave new world.

All Gtk+ applications are mainloop driven, which means that most of the time the app is idle inside a loop that just waits for something to happen and then calls out to the right place when it does. On top of this Gtk+ has a frame clock that gives a “pulse” to the application. This clock beats at a steady rate, which is tied to the framerate of the output (this is synced to the monitor via the window manager/compositor). The clock has several phases:

  • Events
  • Update
  • Layout
  • Paint

The phases happens in this order and we will always run each phase through before going back to the start. The Input events phase is a long stretch of time between each redraw where we get input events from the user and other events (like e.g. network i/o). Some events, like mouse motion are compressed so that we only get a single mouse motion event per clock cycle[1].

Once the event phase is over we pause all external events and run the redraw loop. First is the update phase, where all animations are run to calculate the new state based on the estimated time the next frame will be visible (available via the frame clock). This often involves geometry changes which drives the next phase, Layout. If there are any changes in widget size requirements we calculate a new layout for the widget hierarchy (i.e. we assign sizes and positions). Then we go to the Paint phase where we redraw the regions of the window that needs redrawing.

If nothing requires the update/layout/paint phases we will stay in the event phase forever, as we don’t want to redraw if nothing changes. Each phase can request further processing in the following phases (e.g. the update phase will cause there to be layout work, and layout changes cause repaints).

There are multiple ways to drive the clock, at the lowest level you can request a particular phase by gdk_frame_clock_request_phase() which will schedule a clock beat as needed so that it eventually reaches the requested phase. However in practice most things happen at higher levels:

  • If you are doing an animation, you can use gtk_widget_add_tick_callback() which will cause a regular beating of the clock with a callback in the update phase until you stop the tick.
  • If some state changes that causes the size of your widget to change you call gtk_widget_queue_resize() which will request a layout phase and mark your widget as needing relayout.
  • If some state changes so you need to redraw some area of your widget you use the normal gtk_widget_queue_draw() set of functions. These will request a paint phase and mark the region as needing redraw.

There are also a lot of implicit triggers of these from the CSS layer (which does animations, resizes and repaints as needed).

During the Paint phase we will send a single expose event to the toplevel window[2]. The event handler will create a cairo context for the window and emit a GtkWidget::draw() signal on it, which will propagate down the entire widget hierarchy in back-to-front order, using the clipping and transform of the cairo context. This lets each widget draw its content at the right place and time, correctly handling things like partial transparencies and overlapping widgets.

There are some notable differences here compared to how it used to work:

  • Normally there is only a single cairo context which is used in the entire repaint, rather than one per GdkWindow. This means you have to respect (and not reset) existing clip and transformations set on it.
  • Most widgets, including those that create their own GdkWindows have a transparent background, so they draw on top of whatever widgets are below them. This was not the case before where the theme set the background of most widgets to the default background color. (In fact, transparent GdkWindows used to be impossible.)
  • The whole rendering hierarchy is captured in the call stack, rather than having multiple separate draw emissions as before, so you can use effects like e.g.  cairo_push/pop_group() which will affect all the widgets below you in the hierarchy. This lets you have e.g. partially transparent containers.
  • Due to backgrounds being transparent we no longer use a self-copy operation when GdkWindows move or scroll. We just mark the entire affected area for repainting when these operations are used. This allows (partially) transparent backgrounds, and it also more closely models modern hardware where self-copy operations are problematic (they break the rendering pipeline).
  • Due to the above, old scrolling code is slower, so we do scrolling differently. Containers that scroll a lot (GtkViewport, GtkTextView, GtkTreeView, etc) allocate an offscreen image during scrolling and render their children to it (which is now possible since draw is fully hierarchical). The offscreen image is a bit larger than the visible area, so most of the time when scrolling it just needs to draw the offscreen in a different position.  This matches contemporary graphics hardware much better, as well as allowing efficient transparent backgrounds.
    In order for this to work such containers need to detect when child widgets are redrawn so that it can update the offscreen. This can be done with the new gdk_window_set_invalidate_handler() function.

This skips some of the details, but with this overview you should have a better idea that happens when your code is called, and be in a better position to do further research if necessary.

[1] If you need more mouse event precision you need to look at the mouse device history.

[2] Actually each native subwindow will get one too, but that is not very common these days.

Adventures in Docker land

Connoisseurs of this blog know that I have an interest in application deployment systems, having created three different application bunding system (1,2, 3). These were all experiments in the area of desktop applications, but recently there has been some interesting motions in related areas, namely Docker.

Docker is a server application/container deployment system, which nicely sidesteps a lot of the complexity with desktop apps (not having to integrate deeply with the desktop) which makes it a lot easier to deploy. Additionally, docker is more than a deployment system, it also has some interesting ideas about how to create and distribute applications.

Every docker container is a copy-on-write clone of a specific parent Image, which means instantiation of docker containers is very fast and cheap. It also gives some very interesting properties because you can track the changes in a container (compared to t its parent image) and “commit” this to a new image. This creates a git-like hierarchy of images where every commit is a filesystem layer that applies to a previous layer, up to some base image. The git-like workflow is really nice to work with when creating images, and the final result is very easy to share and deploy, and at the same time automatically shares as much as possible with common base images.

Unfortunately Docker relies on AUFS, a union filesystem that is not in the upstream kernel, nor is it likely to ever be there. Also, while AUFS is in the current Ubuntu kernel it is deprecated there and will eventually be removed. This means Docker doesn’t run on Fedora which has a primarily-upstream approach to packaging.

So, the last month or so I’ve been working on making Docker work in Fedora (and thus eventually in RHEL, which is the nr 1 requested Docker feature). Of course, this work will benefit other distributions that don’t have AUFS too.

I started looking at possible replacements for the copy-on-write support, and there are a few possibilities availible:

  • overlayfs
  • btrfs
  • lvm snapshots
  • lvm thin provisioning

Overlayfs is a different union filesystem implementation than AUFS, and the one that seems most likely to land upstream. But that is happening slowly, if at all. Long-term I think this is the best option, but right now it is out of the question.

Btrfs has copy-on-write both using filesystem snapshots and one a file basis using reflink. However, btrfs is not currently used much in production as its not considered stable enough. It would also be a very heavy dependency for Docker, as may users would have to reformat their disks to use it.

Lvm snapshots are useful for doing e.g. backup of a snapshot, but regress badly in performance when you start having many snapshots of the same device.

This leaves us with only lvm thin provisioning. This is a fairly recent, but relatively stable technology that allows you to create copy-on-write block devices that are “thinly” provisioned, meaning they don’t use real space until the device is in use. This is not ideal for Docker as it really wants copy-on-write at the file level, but with some work it is possible to work around this.

Rather than interacting with lvm which is a very generic volume manager I chose to use the lower level device-mapper kernel APIs directly (via libdevmapper). This allows us greater ease of access to the devices programmatically, as well as avoiding confusion with possible system use of LVM. Also it avoids some LVM performance issues with very many devices.

So, we set up a single large block device on which we create a device-mapper “thinp” pool. On this we then creates a single “base” block device formated with ext4. Every image and container are then created as snapshot (in multiple steps) from this base device. So, say you’re starting a container based on an image “apache” which itself is  based on a “fedora” image, we would:

  1. Create a snapshot of the base device.
  2. Mount it and apply the changes in the fedora image.
  3. Create a snapshot based on the fedora device.
  4. Mount it and apply the changes in the apache image.
  5. Create a snapshot based on the apache device.
  6. Mount it and use as the root in the new container.

And of course, these devices will be reused (with corresponding steps skipped) as needed by other images/containers.

The devicemapper pool need to be set up on a large block device that fits all the images and containers that you will be used which would be painful for most people. Docker handles this by automatically creating the a large sparse file, using it as a loopback device for the devicemapper work. Additionally we ensure that DISCARD support is enabled in the filesystem so that any files removed in the conttainer filters down to the loopback file making it sparse again.

This means that there is no need for setup, and space for images and containers will only be used as needed. Of course, there are still issues, like the max size of the loopback mount (100G by default, but this should be easy to grow) and the max size of the base extt4 image (10G by default, resizing is harder after initial construction, but should be possible).

We’re currently in the process of landing this in Docker, and hope to have a 0.7 release out based on my device-mapper work pretty soon. Then I will continue working on making docker a first-class citizen on Fedora.

HiDPI support in Gnome

For some time I’ve been working on HiDPI support for Gnome, in order to support all the new laptops with very high resolution displays. This work is now at a stage where I can start showing it off and adventurous users might even want to test it out.

The support happens on many layers, like:

  • Wayland: I’ve added support in the protocol for scaled windows and outputs, and implemented this in Weston (the Wayland compositor reference implementation) and the wayland sample clients. This is currently in the unstable branches and will be in Wayland 1.2.
  • Cairo: In order to seamlessly support hidpi rendering I’ve added support for setting device-scale on cairo surfaces. This code is a prerequisite for the Gtk+ support, and is currently in a cairo branch, but will land in Cairo 1.13 eventually.
  • Gnome-icon-theme: Added high resolution version of most images used in the theme.

But most of the work has been in Gtk+. There is a branch called wip/window-scales that contains this code. It has:

  • Support for scaled windows in the wayland backend, including support for different scale factors on different outputs.
  • Support for scaled windows on X, limited to one scaling factor for all monitors.
  • Support for retina displays on OSX
  • Support for alternative CSS assets for scaled windows, so that you can specify higher resolution background and border images in your themes that will be used on hidpi screens.
  • Support for alternative high-resolution icons for scaled windows, including additions to the icon theme spec so that themes can specify higher resolution images with less details (i.e. 24×24@2x is the same size as 48×48@1x, but has less detail).

Here is how it looks right now: (click for full version)

This is work in progress, but if you want to try it out, here is the code you need to use:

You can then enable the scaling either by setting GDK_SCALE=2 in the environment, or by using gsettings (only works under gnome):

gsettings set org.gnome.settings-daemon.plugins.xsettings overrides 
   "{ 'Gdk/WindowScalingFactor':<2>, 'Gdk/UnscaledDPI':<92160> }"

There is still a lot of work to do, but I hope to have this mostly done for my talk at Guadec. Hopefully I’ll see you there!

And last, but not least, many thanks to Brion Vibber who generously donated a Chromebook Pixel to me so that I could work on this stuff.

More Gtk+ in the cloud

Since my last blog entry I’ve worked a bit more on broadway, and the openshift-broadway cartridge for openshift.

New in this version:

  • Bumped all modules to the final gnome 3.8 releases
  • Chrome support (this used to work, but had regressed)
  • Password protection for the broadway session (optional)
  • Added a simple app launcher to the default session
  • Added gedit, gnome-terminal and glade to build
  • Made it work with https (using a terminating ssl proxy in openshift)
  • General bugfixes and robustness

Here is a video of this:

[youtube width=”512″ height=”384″]http://www.youtube.com/watch?v=a8__4mF4-8g[/youtube]

Broadway on openshift

I spent some time this week making a clean build of Gtk 3 with the broadway backend for deploying on openshift. It works suprisingly well, given that I’m in sweden and the openshift servers are somewhere in the US:

[youtube]https://www.youtube.com/watch?v=cyHTAGoEuL8[/youtube]

I made the video instead of linking to the app itself because only one person at a time can use the actual app. However, you can easily create your own openshift module based on this as I made the openshift code availible at gitthub. Just clone that and push to your diy based openshift app. Play with it! Build your favourite gtk3 app! Share it!

It lacks authentication/encryption and there are some minor issues, but it proves that broadway works reasonably well even over the internet.

Developer Hackfest status

Today is the third day of the Gnome Developer Experience hackfest. The first day I was in the platform group where we looked at the core gtk+ platform and whats missing from it. We ended up with a pretty large list of items, but we picked a few of them and started on a few of them. More details in cosimocs blog post.

Yesterday we finally had all the people interested in the application deployment and sandboxing story here, so we started looking at that. I’ve historically had a lot of interest in this area with previous experiments like glick, glick2, and bundler, so this is my main interest this hackfest. We have some initial plans for how to approach this.

There are two fundamental, but interrelated problems in this area. One is the deployment of the application (how to create an application, bundle it, “install” it, set up its execution environment and start it) and the second is sandboxing (protecting the user session against the app).

Deployment

For deplyoment we’re considering a bundling model where the app ships the binary and some subset of the libraries it needs, plus a manifest that describes what the kind of system ABIs that the app requires. These are very course grained, unlike the traditional per-library dependencies, and would be on the level of i.e.

  • bare: Just the kernel ABI
  • system: libc, libm, and a few core libs
  • gnome-platform-1.0: The full gnome upstream defined set of “stable supported ABIs”

There is a scale here where apps (like games) can chose to bundle more dependencies, and expect more stability over time, but it will not be very integrated, while on the other hand more integrated desktop apps which will be tied to specific versions of some desktop platform.

What is important here is that once an app specifies a particular profile we guarantee that this is all that the app ever sees, i.e. we support isolation of the app runtime environment vs the distribution environment that the user is running. This is done by containers and namespaces where we mount only the system files and application files into the app namespace.

With this kind of isolation we guarantee two things, first of all there will never be any accidental leaks of dependencies. If your app depends on anything not in the supported platform it will just not run and the developer will immediately notice. Secondly, any incompatible changes in system libraries are handled and the app will still get a copy of the older versions when it runs.

One nice thing about this setup is that the application runtime environment will look like a minimal but standard linux install with the app and its dependencies in the standard prefixes like /usr/lib, etc. This means that when creating an application one can very easily reuse existing .deb or .rpm files by just extracting these and putting in the application bundle. Of course, it also will require a higher level of binary compatibility guarantees than what we have previously handled for modules the platform profiles provide. For instance, internal IPC protocols the platform libraries use absolutely have to be backwards compatible.

Sandboxing

The sandbox model goes hand-in-hand with the isolation model of app deployment, in the sense that whatever the app should not be able to do it will not even see. So, for a fully sandboxed app we will not even mount in the users home directory in the app namespace, rather than not having access rights to it. (We will of course also offer applications with unlimited sandboxes so that existing apps can run.)

In order to talk with the sandboxes app we need a IPC model that handles the domain transition between the namespaces. This implies the kernel being involved, so we have been looking (again) at getting some form of dbus routing support into the kernel. Hopefully this will work out this time.

Unfortunately we also need IPC for the Xserver, which is very hard to secure. We’ve decided to just just ignore this for now however, as it turns out Wayland is a very good fit for this, since it naturally isolates the clients.

We also talked about implementing something similar to the Intents system in android as a way to allow sandboxed applications to communicate without necessarily knowing about each other. This essentially becomes a DBus service which keeps a registry of which apps implements the various interfaces we want to support (e.g. file picking, get-a-photo, share photo) and actually proxies the messages for these to the right destination. We had a long discussion about the name for these and came up with the name “Portals”, reflecting the domain-transition that these calls represent.

We’re continuing to discuss the details of the different parts and hopefully we can start implement parts of this soon. After the hackfest we will continue discussions on the gnome-os mailing list.