Netlink-based D-Bus

As stated in my last blog post, we have been looking at different options for optimizing D-Bus. After some internal discussion and reviewing of the feedback we got, we think the best solution is to get the best ideas from AF_DBUS, but without having to create a new socket family, which wasn’t very well welcomed by the Linux kernel developers. This brought us to having to choose a transport that allowed us to do that, so we decided on Netlink (an IPC mechanism for kernel to user-space communications).

Below is a detailed description of the architecture we are planning.

Netlink sockets
The Netlink protocol is a family of socket based IPC mechanism that can be used to communicate between the kernel and user-space processes and between user-space processes themselves. It was created as a replacement for ioctl and to receive messages sent by the kernel. It is a datagram-oriented service with both SOCK_RAW and SOCK_DGRAM valid socket types. It is based on the Berkeley sockets API and uses the AF_NETLINK address family. Netlink supports different Netlink families such as NETLINK_ROUTE, NETLINK_FIREWALL and NETLINK_SELINUX, each of which is used to communicate with a specific kernel service.

Since Netlink is used as an IPC mechanism for processes (and the kernel) on the same machine, its address only has a port number that identifies each peer (nl_pid). Since Netlink supports both unicast and multicast communication, a message to a group (nl_groups) can also be sent but only process with uid 0 are allowed to send multicast messages from user-space. A Netlink address is represented using the sockaddr_nl data structure:

struct sockaddr_nl {
        __kernel_sa_family_t    nl_family;      /* AF_NETLINK   */
        unsigned short  nl_pad;         /* zero         */
        __u32           nl_pid;         /* port ID      */
        __u32           nl_groups;      /* multicast groups mask */
};

A Netlink message header consists of the fields:

struct nlmsghdr {
        __u32           nlmsg_len;      /* Length of message including header */
        __u16           nlmsg_type;     /* Message content */
        __u16           nlmsg_flags;    /* Additional flags */
        __u32           nlmsg_seq;      /* Sequence number */
        __u32           nlmsg_pid;      /* Sending process port ID */
};

The Netlink protocol is explained in detail here.

Generic Netlink subsystem
Every Netlink family is identified by an integer number that allows using different Netlink services. Currently there are 21 assigned Netlink families from a maximum of 32. To avoid a shortage of Netlink families the Generic Netlink subsystem was created.

The Generic Netlink subsystem can multiplex different communication channels on a single Netlink family NETLINK_GENERIC. Generic Netlink subsystem is not only a simplified Netlink usage but also the communication channels can be registered at run-time without modifying core kernel code or headers.

The Generic Netlink subsystem is implemented as a service bus inside the kernel and users communicate with each other over it. The users can reside both in user-space or inside the kernel. The bus supports a number of Generic Netlink communications channels that are dynamically allocated by a Generic Netlink controller. This controller is a kernel Generic Netlink user itself, that listens on a special pre-allocated Generic Netlink channel “nlctrl” (GENL_ID_CTRL) where users send requests to create, remove and learn about available channels.

Communication channels are uniquely identified by a channel number that is allocated by the Generic Netlink controller. Users that want to provide services over Generic Netlink bus have to communicate with the Generic Netlink controller and ask it to create a new communication channel. Also, users that want to access those services have to query the Generic Netlink controller to know if these services are available and which channel number are currently using.

Every channel is identified by a Generic Netlink family and defines a set of commands that users can trigger. Each command is associated with a function handler that gets executed when a user sends a message specifying this command.

A Generic Netlink message header consists of the fields:

struct genlmsghdr {
        __u8    cmd;
        __u8    version;
        __u16   reserved;
};

Generic Netlink uses the standard Netlink system as a transport so its message format is defined as follows

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                Netlink message header (nlmsghdr)              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |           Generic Netlink message header (genlmsghdr)         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |             Optional user specific message header             |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |           Optional Generic Netlink message payload            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The family (communication channel) used is specified using the Netlink message header (nlmsghdr) type field (nlmsg_type).

Each Generic Netlink family can use a family specific header to be used by the service provided in that channel.

D-bus as a Generic Netlink service
D-Bus can be implemented as a Generic Netlink service by creating a new Generic Netlink family (communication channel) “dbus”. Applications will use this communication channel to send and receive D-Bus messages.

In this scenario, most of the work that is currently done by the dbus-daemon will take place in the D-Bus Netlink service, such as adding applications to the bus when they gain ownership of a name (NameAcquired signal), route the messages to the destination based on the application’s unique name and maintaining match rules (AddMatch method).

The D-bus daemon will only be a special user of the Generic Netlink D-bus service, although it will still have some responsibilities such as authentication and, of course, implementing org.freedesktop.DBus service.

The other D-Bus users (apart from dbus-daemon itself), will just work as they do now, using the D-Bus wire protocol on top of the Netlink transport, although they will have to do some extra step, as explained below.

Genetlink D-bus will provide to applications the following services:
Mechanism to create and delete D-Bus buses: Since we need to separate the traffic for the different buses (system, user, etc) in the kernel module, we need a way for dbus-daemon instances to register buses in the kernel module. To do so, we can specify D-Bus family commands DBUS_CMD_NEWBUS and DBUS_CMD_DELBUS. The process who creates the bus will be the D-bus daemon implementing that bus and all the messages that have org.freedesktop.DBus as destination will be routed to it.

Besides the commands, we have to define a way to specify the name and type of the bus to be added. We can either store that information in a user defined header or define Generic Netlink family attributes to pass that information to the D-bus Generic Netlink service. In any case, the dbus-daemon will be the responsible for choosing a unique name (in the form netlink:name=unique_name), so that the kernel doesn’t have to read any configuration at all, and just has to associate the unique addresses with each bus.

Another option would be to map a D-Bus bus to a multicast group and use the Generic Netlink controller CTRL_CMD_NEWMCAST_GRP and CTRL_CMD_DELMCAST_GRP commands. But we need more fine-grained control over the routing of the messages. We can’t just use genlmsg_multicast() and send the message to every application in the bus. A signal message sent to a bus is not received by all the applications since AddMatch rules can prevent some applications to receive the message. So, we have to maintain our own multicast group based on match rules.

Connect and disconnect from buses: To allow applications to connect to a bus we can define another set of D-bus family commands DBUS_CMD_CONN_BUS and DBUS_CMD_DISC_BUS. When an application wants to connect to a bus, first the bus type is checked, if the type is a
session bus, then only processes that are executed with the same uid as the one for the D-bus daemon are allowed. This restriction is not true for system bus, which allows connection from processes running as any user. Connection requests are routed to the D-bus daemon who does the authentication.

As the create/delete group case, we need to specify to which bus we are trying to connect. We can also store that information on the user defined header or define a set of Generic Netlink family attributes.

Transport to send and receive messages: Receiving messages is straightforward. You only have to create a socket with:

int sd = create_nl_socket(NETLINK_GENERIC, 0);

and use standard BSD socket API such as recv().

To send messages to the bus we have to define a Generic Netlink D-bus family command DBUS_SEND_MSG and fill the Netlink header, Generic Netlink header and if applicable a D-bus service specific header:

struct sockaddr_nl nladdr;
struct {
	struct nlmsghdr n;
	struct genlmsghdr g;
	char buf[256];
} req;
char *dbus_message;

memset(&nladdr, 0, sizeof(nladdr));
nladdr.nl_family = AF_NETLINK;

req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
req.n.nlmsg_type = dbus_family_id;
req.n.nlmsg_flags = NLM_F_REQUEST;
req.g.cmd = DBUS_SEND_MSG;

na = (struct nlattr *) GENLMSG_DATA(&req);
na->nla_type = DBUS_ATTR_PAYLOAD;
na->nla_len = NLA_HDRLEN + strlen(dbus_message);
memcpy(NLA_DATA(na), &dbus_message, strlen(dbus_message));
req.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);

ret = sendto(sd, (char *)&req, req.n.nlmsg_len, 0, (struct sockaddr *) &nladdr,
	     sizeof(nladdr));

All this is even easier when using libnl, a library that simplifies a lot the use of Netlink in user-space applications. This library is used in other system services, like NetworkManager, so adding a dependency on it to D-Bus shouldn’t be a problem.

The Genetlink D-bus service will parse the D-bus message, add the sender field and route to the correct destination in the case of a unicast message. If the message is a signal, the service will get the recipients list according to the match rules.

Also, it will process the NameAcquired and NameLost signals as well as the AddMatch method calls, so that it can keep track of where the messages need to go to.

Security framework: In the previous sections, authentication was mentioned as one of the responsibilities of the dbus-daemon itself. This is indeed what it does right now, but with the kernel Netlink module doing the routing based on user id, as explained above, maybe no authentication is needed on the dbus-daemon side. The question is whether the dbus-daemon should trust all that comes from the kernel or just do an extra check.

For some more fine-grained security, D-Bus services can use PolicyKit to prompt the user requesting the operation for extra authentication.

Support sending large messages: Some D-Bus users complain about bad performance from D-Bus when sending large chunks of data over it, that being the reason for file descriptor passing being available on D-Bus. It is true, though, that one can argue that those applications shouldn’t be sending that much data over the bus, and that it is the application’s responsibility, but the truth is that the problem exists.

Netlink provides the ability to send large messages by using multipart messages, so that the data to be sent can be sliced into chunks (no bigger than the kernel socket buffers’ size), resulting in better performance.

Implementation details
All this needs a bigger change to libdbus/bindings as in our initial plan, since the Netlink messages, as explained before, carry on an extra header that needs to be parsed before the real D-Bus message is processed.

So, for libdbus, we will implement DBusServerNetlink object for the implementation of a Netlink-based D-Bus server, and DBusTransportNetlink for the actual implementation of the wire protocol to be used when using Netlink as a transport. DBusTransportNetlink will be responsible for getting and parsing messages from the Netlink D-Bus kernel service.

For bindings, similar work will be needed to add support to reading and writing Netlink messages, but with the use of the libnl library, this should make it easier, and anyway, it is part of our plan to add whatever code is needed to the most popular bindings.

And that’s all for now, any comments/feedback is appreciated.

D-Bus optimizations II

As explained in my previous post, we are working on optimizing D-Bus for its usage on embedded systems (more precisely on GENIVI).

We started the discussion on the Linux-netdev mailing list about getting our patch to add multicast on UNIX sockets accepted, but, unfortunately, the feedback hasn’t been very good. So, since one of the premises from GENIVI is to get all the work we are doing accepted upstream, we have been thinking in the last few days about what else to use for, as stated in my previous post, fixing the main issue we have found in D-Bus performance, which is the number of context switches needed when getting all traffic on the bus through dbus-daemon. So, here’s a summary of the stuff we have been looking at:

  • Use TIPC, a socket family already available in modern kernel versions and which is targeted at clustering environments, although it can be used for communications inside a single node.
  • Use ZeroMQ, which is a library that, from a first look, provides the stuff we need for D-Bus, that is multicast on local sockets.
  • Provide the multicast on UNIX sockets as a new socket family (AF_MCAST), although this wasn’t well received neither on the linux-netdev discussion. This will contain a trimmed down version of AF_UNIX with only the stuff needed for doing multicast.
  • Extend the AF_DBUS work from Alban to include what we have learnt from the multicast AF_UNIX architecture. This would mean having a patch to the kernel that, as with the AF_MCAST solution, would have to be maintained by distributors, as the linux-netdev people didn’t like this solution neither.
  • Use Netlink, which has all that we need for D-Bus, that is, multicast and unicast, plus it is an established IPC mechanism in the kernel (from kernel space to user space), and is even used for other services similar to D-Bus. We would create a new Netlink subfamily for D-Bus, that would contain code to do the routing, as Netlink, for security reasons, does not allow direct connection between user space apps.
  • Use KBUS, which is a lightweight messaging system, provided as a kernel module.

Right now, we have working code for AF_MCAST, and are looking at Netlink, TIPC and KBUS, so will be blogging more details on what we find out in our experiments. But any feedback would be appreciated since, as stated before, we want to have all this work accepted upstream. So, comments, suggestions?