Collabora – Rodrigo Moya

Farewell Collabora

Today has been my last day at Collabora, where I’ve spent the last 2 years having a great time and working on some very cool and some not that cool projects, but always with great people as teammates. On the cool projects front, AF_BUS wins. Even though it didn’t gain upstream acceptance, at least it opened the race to bring D-Bus into the kernel, which I think is a great thing (based on real numbers 😀 ).

A big hug to all collaborans I’ve met in these 2 years!

Not sure yet what’s coming up next, but first thing I will do, for the first time in aeons, is to get a few weeks of well deserved relax, work on some personal stuff I haven’t had much time to work on lately, and have fun!

desktop-webapp-browser-extension

A few months ago I started work on a Google Chrome/ium extension for integrating Chrome apps into GNOME Shell. The idea is that, whenever you install a Chrome app, a .desktop file is created in ~/.local/share/applications, for them to show up as normal applications.

Also, for desktop shortcuts for “normal” web pages, since Chrome/ium uses the favicon from the pages, the icons look really ugly on GNOME Shell’s overview. So, this extension also tries to retrieve a higher resolution icon and uses it if found, and if not, retrieves a snapshot from the page and uses that as the icon, making it look much nicer in the overview.

So, nothing really magic, but discussing it with some team mates, I thought it could be helpful for other people, so hence this public announcement 🙂

The code can be found here.

Next step, when I have time and find out how, is to submit this to the Google Chrome store, but for now, you can just build it and install the .crx file into your Chrome/ium.

Netlink-based D-Bus

As stated in my last blog post, we have been looking at different options for optimizing D-Bus. After some internal discussion and reviewing of the feedback we got, we think the best solution is to get the best ideas from AF_DBUS, but without having to create a new socket family, which wasn’t very well welcomed by the Linux kernel developers. This brought us to having to choose a transport that allowed us to do that, so we decided on Netlink (an IPC mechanism for kernel to user-space communications).

Below is a detailed description of the architecture we are planning.

Netlink sockets
The Netlink protocol is a family of socket based IPC mechanism that can be used to communicate between the kernel and user-space processes and between user-space processes themselves. It was created as a replacement for ioctl and to receive messages sent by the kernel. It is a datagram-oriented service with both SOCK_RAW and SOCK_DGRAM valid socket types. It is based on the Berkeley sockets API and uses the AF_NETLINK address family. Netlink supports different Netlink families such as NETLINK_ROUTE, NETLINK_FIREWALL and NETLINK_SELINUX, each of which is used to communicate with a specific kernel service.

Since Netlink is used as an IPC mechanism for processes (and the kernel) on the same machine, its address only has a port number that identifies each peer (nl_pid). Since Netlink supports both unicast and multicast communication, a message to a group (nl_groups) can also be sent but only process with uid 0 are allowed to send multicast messages from user-space. A Netlink address is represented using the sockaddr_nl data structure:

struct sockaddr_nl {
        __kernel_sa_family_t    nl_family;      /* AF_NETLINK   */
        unsigned short  nl_pad;         /* zero         */
        __u32           nl_pid;         /* port ID      */
        __u32           nl_groups;      /* multicast groups mask */
};

A Netlink message header consists of the fields:

struct nlmsghdr {
        __u32           nlmsg_len;      /* Length of message including header */
        __u16           nlmsg_type;     /* Message content */
        __u16           nlmsg_flags;    /* Additional flags */
        __u32           nlmsg_seq;      /* Sequence number */
        __u32           nlmsg_pid;      /* Sending process port ID */
};

The Netlink protocol is explained in detail here.

Generic Netlink subsystem
Every Netlink family is identified by an integer number that allows using different Netlink services. Currently there are 21 assigned Netlink families from a maximum of 32. To avoid a shortage of Netlink families the Generic Netlink subsystem was created.

The Generic Netlink subsystem can multiplex different communication channels on a single Netlink family NETLINK_GENERIC. Generic Netlink subsystem is not only a simplified Netlink usage but also the communication channels can be registered at run-time without modifying core kernel code or headers.

The Generic Netlink subsystem is implemented as a service bus inside the kernel and users communicate with each other over it. The users can reside both in user-space or inside the kernel. The bus supports a number of Generic Netlink communications channels that are dynamically allocated by a Generic Netlink controller. This controller is a kernel Generic Netlink user itself, that listens on a special pre-allocated Generic Netlink channel “nlctrl” (GENL_ID_CTRL) where users send requests to create, remove and learn about available channels.

Communication channels are uniquely identified by a channel number that is allocated by the Generic Netlink controller. Users that want to provide services over Generic Netlink bus have to communicate with the Generic Netlink controller and ask it to create a new communication channel. Also, users that want to access those services have to query the Generic Netlink controller to know if these services are available and which channel number are currently using.

Every channel is identified by a Generic Netlink family and defines a set of commands that users can trigger. Each command is associated with a function handler that gets executed when a user sends a message specifying this command.

A Generic Netlink message header consists of the fields:

struct genlmsghdr {
        __u8    cmd;
        __u8    version;
        __u16   reserved;
};

Generic Netlink uses the standard Netlink system as a transport so its message format is defined as follows

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                Netlink message header (nlmsghdr)              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |           Generic Netlink message header (genlmsghdr)         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |             Optional user specific message header             |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |           Optional Generic Netlink message payload            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The family (communication channel) used is specified using the Netlink message header (nlmsghdr) type field (nlmsg_type).

Each Generic Netlink family can use a family specific header to be used by the service provided in that channel.

D-bus as a Generic Netlink service
D-Bus can be implemented as a Generic Netlink service by creating a new Generic Netlink family (communication channel) “dbus”. Applications will use this communication channel to send and receive D-Bus messages.

In this scenario, most of the work that is currently done by the dbus-daemon will take place in the D-Bus Netlink service, such as adding applications to the bus when they gain ownership of a name (NameAcquired signal), route the messages to the destination based on the application’s unique name and maintaining match rules (AddMatch method).

The D-bus daemon will only be a special user of the Generic Netlink D-bus service, although it will still have some responsibilities such as authentication and, of course, implementing org.freedesktop.DBus service.

The other D-Bus users (apart from dbus-daemon itself), will just work as they do now, using the D-Bus wire protocol on top of the Netlink transport, although they will have to do some extra step, as explained below.

Genetlink D-bus will provide to applications the following services:
Mechanism to create and delete D-Bus buses: Since we need to separate the traffic for the different buses (system, user, etc) in the kernel module, we need a way for dbus-daemon instances to register buses in the kernel module. To do so, we can specify D-Bus family commands DBUS_CMD_NEWBUS and DBUS_CMD_DELBUS. The process who creates the bus will be the D-bus daemon implementing that bus and all the messages that have org.freedesktop.DBus as destination will be routed to it.

Besides the commands, we have to define a way to specify the name and type of the bus to be added. We can either store that information in a user defined header or define Generic Netlink family attributes to pass that information to the D-bus Generic Netlink service. In any case, the dbus-daemon will be the responsible for choosing a unique name (in the form netlink:name=unique_name), so that the kernel doesn’t have to read any configuration at all, and just has to associate the unique addresses with each bus.

Another option would be to map a D-Bus bus to a multicast group and use the Generic Netlink controller CTRL_CMD_NEWMCAST_GRP and CTRL_CMD_DELMCAST_GRP commands. But we need more fine-grained control over the routing of the messages. We can’t just use genlmsg_multicast() and send the message to every application in the bus. A signal message sent to a bus is not received by all the applications since AddMatch rules can prevent some applications to receive the message. So, we have to maintain our own multicast group based on match rules.

Connect and disconnect from buses: To allow applications to connect to a bus we can define another set of D-bus family commands DBUS_CMD_CONN_BUS and DBUS_CMD_DISC_BUS. When an application wants to connect to a bus, first the bus type is checked, if the type is a
session bus, then only processes that are executed with the same uid as the one for the D-bus daemon are allowed. This restriction is not true for system bus, which allows connection from processes running as any user. Connection requests are routed to the D-bus daemon who does the authentication.

As the create/delete group case, we need to specify to which bus we are trying to connect. We can also store that information on the user defined header or define a set of Generic Netlink family attributes.

Transport to send and receive messages: Receiving messages is straightforward. You only have to create a socket with:

int sd = create_nl_socket(NETLINK_GENERIC, 0);

and use standard BSD socket API such as recv().

To send messages to the bus we have to define a Generic Netlink D-bus family command DBUS_SEND_MSG and fill the Netlink header, Generic Netlink header and if applicable a D-bus service specific header:

struct sockaddr_nl nladdr;
struct {
	struct nlmsghdr n;
	struct genlmsghdr g;
	char buf[256];
} req;
char *dbus_message;

memset(&nladdr, 0, sizeof(nladdr));
nladdr.nl_family = AF_NETLINK;

req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
req.n.nlmsg_type = dbus_family_id;
req.n.nlmsg_flags = NLM_F_REQUEST;
req.g.cmd = DBUS_SEND_MSG;

na = (struct nlattr *) GENLMSG_DATA(&req);
na->nla_type = DBUS_ATTR_PAYLOAD;
na->nla_len = NLA_HDRLEN + strlen(dbus_message);
memcpy(NLA_DATA(na), &dbus_message, strlen(dbus_message));
req.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);

ret = sendto(sd, (char *)&req, req.n.nlmsg_len, 0, (struct sockaddr *) &nladdr,
	     sizeof(nladdr));

All this is even easier when using libnl, a library that simplifies a lot the use of Netlink in user-space applications. This library is used in other system services, like NetworkManager, so adding a dependency on it to D-Bus shouldn’t be a problem.

The Genetlink D-bus service will parse the D-bus message, add the sender field and route to the correct destination in the case of a unicast message. If the message is a signal, the service will get the recipients list according to the match rules.

Also, it will process the NameAcquired and NameLost signals as well as the AddMatch method calls, so that it can keep track of where the messages need to go to.

Security framework: In the previous sections, authentication was mentioned as one of the responsibilities of the dbus-daemon itself. This is indeed what it does right now, but with the kernel Netlink module doing the routing based on user id, as explained above, maybe no authentication is needed on the dbus-daemon side. The question is whether the dbus-daemon should trust all that comes from the kernel or just do an extra check.

For some more fine-grained security, D-Bus services can use PolicyKit to prompt the user requesting the operation for extra authentication.

Support sending large messages: Some D-Bus users complain about bad performance from D-Bus when sending large chunks of data over it, that being the reason for file descriptor passing being available on D-Bus. It is true, though, that one can argue that those applications shouldn’t be sending that much data over the bus, and that it is the application’s responsibility, but the truth is that the problem exists.

Netlink provides the ability to send large messages by using multipart messages, so that the data to be sent can be sliced into chunks (no bigger than the kernel socket buffers’ size), resulting in better performance.

Implementation details
All this needs a bigger change to libdbus/bindings as in our initial plan, since the Netlink messages, as explained before, carry on an extra header that needs to be parsed before the real D-Bus message is processed.

So, for libdbus, we will implement DBusServerNetlink object for the implementation of a Netlink-based D-Bus server, and DBusTransportNetlink for the actual implementation of the wire protocol to be used when using Netlink as a transport. DBusTransportNetlink will be responsible for getting and parsing messages from the Netlink D-Bus kernel service.

For bindings, similar work will be needed to add support to reading and writing Netlink messages, but with the use of the libnl library, this should make it easier, and anyway, it is part of our plan to add whatever code is needed to the most popular bindings.

And that’s all for now, any comments/feedback is appreciated.

D-Bus optimizations II

As explained in my previous post, we are working on optimizing D-Bus for its usage on embedded systems (more precisely on GENIVI).

We started the discussion on the Linux-netdev mailing list about getting our patch to add multicast on UNIX sockets accepted, but, unfortunately, the feedback hasn’t been very good. So, since one of the premises from GENIVI is to get all the work we are doing accepted upstream, we have been thinking in the last few days about what else to use for, as stated in my previous post, fixing the main issue we have found in D-Bus performance, which is the number of context switches needed when getting all traffic on the bus through dbus-daemon. So, here’s a summary of the stuff we have been looking at:

Use TIPC, a socket family already available in modern kernel versions and which is targeted at clustering environments, although it can be used for communications inside a single node.
Use ZeroMQ, which is a library that, from a first look, provides the stuff we need for D-Bus, that is multicast on local sockets.
Provide the multicast on UNIX sockets as a new socket family (AF_MCAST), although this wasn’t well received neither on the linux-netdev discussion. This will contain a trimmed down version of AF_UNIX with only the stuff needed for doing multicast.
Extend the AF_DBUS work from Alban to include what we have learnt from the multicast AF_UNIX architecture. This would mean having a patch to the kernel that, as with the AF_MCAST solution, would have to be maintained by distributors, as the linux-netdev people didn’t like this solution neither.
Use Netlink, which has all that we need for D-Bus, that is, multicast and unicast, plus it is an established IPC mechanism in the kernel (from kernel space to user space), and is even used for other services similar to D-Bus. We would create a new Netlink subfamily for D-Bus, that would contain code to do the routing, as Netlink, for security reasons, does not allow direct connection between user space apps.
Use KBUS, which is a lightweight messaging system, provided as a kernel module.

Right now, we have working code for AF_MCAST, and are looking at Netlink, TIPC and KBUS, so will be blogging more details on what we find out in our experiments. But any feedback would be appreciated since, as stated before, we want to have all this work accepted upstream. So, comments, suggestions?

D-Bus optimizations

In the last month and a half, I have been working, as part of my work at Collabora, on optimizing D-Bus, which even though is a great piece of software, has some performance problems that affect its further adoption (specially on embedded devices).

Fortunately, we didn’t have to start from scratch, since this has been an ongoing project at Collabora, where previous research and upstream discussions had been taking place.

Based on this great work (by Alban Créquy and Ian Molton, BTW), we started our work, looking first at the possible solutions for the biggest problems (context switches, as all traffic in the bus goes through the D-Bus daemon, as well as multiple copies of messages in their trip from one peer, via the kernel, then to the daemon, to end up in the peer the message is targeted to), which were:

AF_DBUS work from Alban/Ian: while it improved the performance of the bus by a big margin, the solution wasn’t very well accepted in the upstream kernel mailing list, as it involved having lots of D-Bus-specific code in the kernel (all the routing).
Shared memory: this has no proof-of-concept code to look at, but was a (maybe) good idea, as it would mean peers in the bus would use shared memory segments to send messages to each other. But this would mean mostly a rewrite of most of the current D-Bus code, so maybe an option for the future, but not for the short term.
Using some sort of multicast IPC that would allow peers in the bus to send messages to each other without having all messages go through the daemon, which, as found out by several performance tests, is the biggest bottleneck in current D-Bus performance. We had a look at different options, one of them being AF_NETLINK, which mostly provides all that is needed, although it has some limitations, the biggest one being that it drops packets when the receiver queue is full, which is not an option for the D-Bus case.
UDP/IP multicast has been mentioned also in some of the discussions, but this seems to be too much overhead for the D-Bus use, as we would have to use eth0 or similar, as multicast on loopback device doesn’t exist (hence no D-Bus in computers without a network card). Also, losing packets is another caveat of this solution, as well as message order guarantee.

So, the solution we have come up with is to implement multicast on UNIX sockets, and make it support what we need for it in D-Bus, and, of course, make use of that in the D-Bus implementation itself. So, here’s what we have right now (please note that this is still a work in progress):

An implementation of multicast for UNIX sockets.
A D-Bus branch to use those multicast UNIX sockets, based on current D-Bus master branch.

The way this works is better seen on a diagram, so here it is. First, how the current D-Bus architecture works:

and how this would be changed:

That is, when a peer wants to join a bus, it would connect to the daemon (exactly as it does today), authenticate, and, once the daemon knows the peer is authenticated, it would join the accept‘ed socket to the multicast group (this is important, as we don’t want to have peers join by themselves the multicast group, so it’s the daemon’s job to do that). Once the peer has joined the multicast group, it would use socket filters to determine what traffic it wants to receive, so that it only gets, from the kernel, the messages it really is interested in. The daemon would do the same, just setting its filters so that it only gets traffic to the bus itself (org.freedesktop.DBus well-known name).

In this multicast solution, we might have to prevent unauthorized eavesdropping, even though peers need to authenticate through the daemon to join the multicast group. For this, we have been thinking about using Linux Security Modules. It is still not 100% clear how this would be done, so more information on this soon.

The above-mentioned branches work right now, but as I said before, they are still a work in progress, so they still need several things before we can call this work finalized. For now, we have succeeded in making the daemon not get any traffic at all apart from what it really needs to get, so a big win there already as we are avoiding the expensive context switches, but the socket filters still need a lot of work, apart from other minor and not so minor things.

Right now, we are in the process of getting the kernel part accepted, which is in progress, and to finish the D-Bus branch to be in an upstreamable form. Apart from that, we will provide patches for all the D-Bus bindings we know about (GLib, QtDBus, python, etc).

Comments/suggestions/ideas welcome.

New beginning

I guess it is time to announce that since yesterday I am working at Collabora, a UK-based company very well known for its work in several free software projects, like Telepathy, Farstream, GStreamer and others.

Haven’t had much time really to transition (and relax) from Canonical to Collabora, apart from last week, which I spent skiing, but hey, new year, new life, as we say in Spain, so the sooner you start with your new life, the better.