Dan Williams' blog

They’re moving into the street

The conversation around the Occupy Wall Street protests, lack of jobs, and the widening income gap in the US reminds me of Disturbed’s Land of Confusion video. Hopefully the future doesn’t come down to wars in the street, and hopefully some shadowy dark superhero doesn’t show up, but he’s probably a metaphor anyway. At some point society needs to agree that the capitalist drive for maximizing shareholder value also includes certain moral/ethical behavior instead of profits-at-all-costs. But the harder part is agreeing on how to enforce that behavior, since “The Market” doesn’t really encourage it. Which I think is one of the real issues in the world today, and has been for a long time.

When the Sun Shines We’ll Shine Together

Everyone loves NetworkManager 0.9 (Beverly & Pack, cc-by-2.0)

Hurray! It’s finally out: NetworkManager 0.9. Thanks to a ton of help from almost 150 contributors and countless testers we’ve reached a new level of awesome. Let’s drop some recap on y’all:

Fast User Switching

This release debuts full support for fast user switching, a long-requested feature that makes the networking experience on multi-user computers butter-smooth. As a result of the simplified 0.9 architecture, each user gets their own network applet and each applet can control networking independently, provided that user has permissions to do so. If you switch and the new active user doesn’t have permissions for a connection, it’s terminated. It’s as simple as that and works just like you’d expect.

Optimized WiFi Roaming

When connected to a large unified WiFi network, like a workplace, university, or hotel, NetworkManager 0.9 enhances roaming behavior as you move between locations. By using the background scanning and nl80211 features in wpa_supplicant 0.7 and later, you’ll notice fewer drops in connectivity and better signal quality in large networks. Most kernel drivers will now provide automatic updates of new access points and enhanced connection quality reporting, allowing wpa_supplicant to quickly roam to the best access point when the current access point’s quality degrades and not before. Yay! Fewer dropped frames when you’re watching the YouTube Top 100.

WiMAX

Are you one of the 70 million and growing WiMAX users? Got an Intel WiMAX card in your laptop? Great! NetworkManager 0.9 lets you jump on blazing fast WiMAX speeds while you’re on the go. Put that hardware to work: simply pick your provider from the menu, and you’ll be connected automatically when WiMAX is on.

Made of Easy (katieharbath, cc-by-nc-sa-2.0)

Flexible Permissions

Wait, you haven’t taught little Tommy the value of hard-earned cash? Well until you do, you can restrict your metered 3G to everyone but Tommy so he doesn’t run up the bill playing stupid Flash games or poke around with your work email over the VPN. Or if you’re a sysadmin, you can roll out the same network configuration to multiple users and be sure that unauthorized users can’t connect to networks they shouldn’t be able to. The combination of connection permissions and flexible PolicyKit-based authorization lets you manage your computer the way you want.

Consolidated Configuration

No longer do we have multiple settings services storing information in different formats and locations. Instead, all network connection information is stored by NetworkManager itself leading to faster network connections and simpler configuration. Applications now have one place to look for network configuration instead of two; one place to update instead of two; one place to monitor for changes instead of two; you get the picture. More features in half the code, yo.

It'll be our little secret (katietegtmeyer, cc-by-2.0)

Flexible Secrets

Passwords for any connection can now be stored securely in each user’s session or in privileged system connection storage. If you’re a bit paranoid you can choose to enter any password every time you connect instead of saving it. For system administrators this means you can have one connection for all users even if each user’s password is different or they use On-Time-Pad tokens. By default sensitive secrets (like VPN and 802.1x passwords) are stored in the user’s session while generic ones (like WiFi passphrases, etc) are stored system-wide to balance privacy and usability. If you’ve got a different idea of balance, it’s trivial to open it up or lock it down just as much as you like. If you’re on a mobile device and you just don’t care, you can leave everything to NetworkManager and ignore user sessions completely.

Simplified D-Bus API

Consolidation of the API makes it radically simpler for applications to respond to network changes, be smarter about what networks you’re connected to, and how you’re connected to them. It’s trivial to figure out if you’re at home or at work and to do the right thing, so now there’s really no excuse to make your application do what your users expect. And it’s easier to write cool new network applets and configuration UI too. Go wild. Make your apps sing.

GObject Introspection

Want to use NetworkManager from applications that aren’t written in C or C++? With the GObject Introspection it’s trivial to use the NetworkManager convenience libraries from Python or JavaScript or any other introspection-enabled language. Start writing lickable new applets or make your app network aware in the easiest way possible.

const NMClient = imports.gi.NMClient;
client = NMClient.Client.new();
if (client.state == NetworkManager.State.CONNECTED_GLOBAL)
    print "You're connected!"

Developers and Distros

Because it’s a change in D-Bus and libnm-glib API, we’ve prepared a migration guide for developers. If your app just cares about whether you’re connected to the network and how, here are some example patches. Distro packagers should check for the latest versions of chat, backup, browser, mail, etc programs since they probably have had NM 0.9 support for months. As always, there’s a bunch of developer information and API documentation on the website and wiki. If you don’t find what you’re looking for, tell us how to improve!

Networking is never done (j4hnb, cc-by-2.0)

120 Proof for the Future

But this much awesome just isn’t enough. Always looking forward, NetworkManager is primed for great new features like connectivity detection, captive portal auto-login, network zones, automatic firewall and proxy management, new hardware support, and more. As a result of the API cleanup done for 0.9, NM is ready for the next wave of great features that will actually make your life better. A faster, more robust release process will ensure these features get to you more quickly. If we’ve done our job, you won’t even notice that NetworkManager is there; but it will be, saving the planet one network at a time.

Yo Berlin!

Lock up your booze and your network cards (in that order), I’m hitting up Berlin for Desktop Summit. I’ll be talking about network and location awareness in your application on Sunday, which is thinly veiled code for how to make NetworkManager and ModemManager tell you where you are and how to get where you want to go. I’ll also be hosting a BOF (with Will Stephenson, hopefully) on Wednesday afternoon in which you can alternatively deride and praise networking on Linux. If you don’t attend, I will be supremely disappointed and you can be assured WiFi access points will shun your feeble attempts at association. Otherwise let the summitting begin.

DIDO?

No, not her. But Distributed-In-Distributed-Out. I’ve often thought that current cellular-derived systems (CDMA, EVDO, UMTS, LTE, etc) were insanely complex at the radio/protocol level. WiMAX is less complex than the gigantic hairball of UMTS/LTE that all the telcos coughed up since it comes from the IEEE instead of the ETSI/3GPP groups, but it’s certainly not simple. I mean, look at the AT command specifications for UMTS or LTE; there’s just so much there for setting up bearers for this and bearers for that, QoS for whatever, latency requirements, etc. I can’t imagine having to program a radio protocol stack like the team at OpenBTS is doing. It’s all there because the radio channel is shared spectrum and voice calls are still the most important part. If you can’t make a voice call because some douchebag is watching Youtube, you’d be pissed. And for whatever reason they still haven’t figured out how to reliably do VOIP over 4G networks leading to stuff like Circuit Switched Fallback and (for Verizon) using the CDMA 1x network for voice and the LTE network for data. Wouldn’t it be great to keep things simple?

And that’s where Rearden and OnLive come in. Over the past 10 years they decided to throw out everything the ETSI, 3GPP, and 3GPP2 think they know about wireless, and rebuild it from the ground up. All because they need a really low-latency, cheap, reliable wireless medium to play games over. And I hope they make it work because it would really disrupt the existing wireless incumbents with their layers upon layers of protocols and complexity and crap and eye-bulging prices for wireless data. And the fact that it appears so freakishly simple on the client side makes my life easier since we don’t have to do all sorts of stupid setup just to send a single IP packet over the network. Here’s to the future…

PSA: GtkBuilder, toplevels, and gtk_widget_destroy()

So this has nailed me twice and maybe this time I’ll remember. If you have a toplevel (GtkWindow, GtkDialog, etc) in a GtkBuilder file, and you load that file into a GtkBuilder object, you need to remember to explicitly call gtk_widget_destroy() on it. GtkBuilder will sink the initial GTK floating ref for you, but that means you now have widget with 2 references (object creation and the ref sink) and getting rid of the GtkBuilder will only remove one of those references for you. You then need to remember to call gtk_widget_destroy() to get rid of the other one. Not g_object_unref() apparently, as that’ll cause segfaults somewhere later on during widget destruction when something tries to disconnect some signal handlers somewhere, but gtk_widget_destroy(). This also removes the toplevels from GTK’s “toplevel_list” which, if you’re not careful and forget to destroy it, can lead to segfaults later when GTK tries to issue grabs when you’re scrolling. Those are always entertaining to track down. And when I say entertaining I don’t actually mean it.

GtkBuilder even has documentation about this:

For toplevel windows constructed by a builder, it is the responsibility of the user to call gtk_widget_destroy() to get rid of them and all the widgets they contain.

but somehow I keep forgetting. And then I waste and hour figuring out WTF is going wrong.

The Incredible Magical Pantech UML290

The Basics

Along with the LG VL600 this modem was the launch device for the Verizon 4G LTE network late last year. Despite being quite large (over twice the size of a normal 3G modem) it’s not a bad device and performs quite well in speed tests. Inside is a Qualcomm MDM9600 chipset providing both CDMA 1xRTT and EVDO on the standard North American 850 MHz Cellular and 1900 MHz PCS bands, and LTE on Verizon’s Upper 700 MHz C-block band. This device cannot roam internationally.

Linux Support

The UML290 exposes four USB interfaces: a standard CDC-ACM AT command port which supports PPP, a QCDM port, a WMC port, and a raw IP network port. Of these, only the AT command and the QCDM ports are really usable in Linux. You can connect to the LTE network using standard ETSI 27.007 GSM-style AT commands like AT+CGDCONT and ATD#99* and such. Connections to the 3G EVDO network can be made with the standard ATD#777 command. Unfortunately, the PPP functionality does not support data connection handoff between the EVDO and LTE networks, so you have to break the connection and reconnect with the appropriate ATD command when necessary. Why is that?

To allow seamless operation between the EVDO and LTE networks Verizon upgraded parts of their core network to eHRPD. HRPD (High Rate Packet Data) is the new name for HDR (High Data Rate) which was the old name for the IS-856 standard developed by Qualcomm ten years ago for high speed 3G packet data. EVDO (Evolution Data Only) is just the marketing name for all that. eHRPD stands for “evolved” or “enhanced” HRPD and essentially drops in pieces of the LTE core network modified to work with older EVDO protocols. Normally your device uses the eHRPD protocol when starting a data session since both the network and the modem support it. But when you use traditional CDMA PPP via ATD#777 the session is between pppd on your computer and the packet data gateway in the network, in contrast to GSM/WCDMA/LTE where the PPP session is only between pppd and the modem itself, not over the air. My theory here is that to maintain backwards compatibility or for some other reason, PPP data sessions using ATD#777 only allow HRPD, and thus handoffs between EVDO and LTE don’t work because the LTE side doesn’t like the older HRPD.

This leads to the problem where you, as the user, have to poke values into the NV_HDRSCP_FORCE_AT_CONFIG_I NVRAM item to manually switch between HRPD and eHRPD just to get connected. Why does this matter? Because the only way to connect to the EVDO network on Linux is with a direct PPP data session using ATD#777. That sucks.

All Hail WMC (wait, what?)

Hardware often makes me want to dress all in black, sit at the end of the bar, drink, and cry. Often Matthew Garrett is right there with me so at least I have company on my trip to black, black oblivion. The hope is that talking to the UML290 on the WMC port and using the modem’s native network interface makes this stupid handoff problem just go away because the modem firmware takes care of the data session protocols and handoffs when you’re not using direct PPP. But that means that we need to reverse engineer both the WMC protocol and the network interface. I’ll drink to that.

It turns out the network interface appears to just be passing raw IP packets over USB. At least that’s what the Windows USB traces tell me unless I’ve had to much Jacky D in which case they just look like Care Bears and rainbows. Qualcomm posted some driver patches for the “smd_rmnet” driver for Android devices that describe a “raw IP” mode for RMNET interfaces that lead me to believe I’m on the right track here. We’ll see.

The WMC bits are the best part though. This Pantech-specific (as far as I can tell) protocol that has been around at least since 2005 since I’ve got an Audiovox PC5740 that uses it and a Pantech PX-500 on Sprint that looks similar yet different. WMC is just another binary protocol; essentially encoding structs on the wire but with a bunch of stupid at the front and some idiot at the end. It’s got a frame start marker of 0xC8, except when there’s more shit at the front. It’s got a frame terminator of 0x7E, except when it doesn’t. It gets HDLC escaped, except when even control characters get escaped instead of just the escape characters. It’s got standard command numbers, except when it doesn’t.

The basic WMC frame starts with 0xC8. The PC5740 and the PX-500 both accept plain WMC requests like this. The UML290 on the other hand uses just about the most convoluted format I can think of. I’d really love to know why. I hope there’s a good reason. Instead the Verizon connection manager sends the WMC packet prefixed with “AT*WMC=”, then 0xC8, and then a bunch of binary data. And not only are the HDLC escape characters escaped, all control characters under 0x20 are escaped too. Even better, the request terminates with a 0x0D instead of the standard 0x7E. So you end up with something looking like this:

41542a574d433dc87d2a87b80d

and when all the framing and shit is removed, it comes down to a single byte: 0x0A. That’s it. Really. Why is this so hard? It’s USB for crying out loud. We’re not on serial links anymore where if somebody picks up the telephone downstairs you get a bunch of garbage in your XMODEM transfer.

It gets better. There’s a CRC-16 at the end, which is pretty standard with these sorts of binary modem protocols. Qualcomm writes the original firmware for all these modems anyway and they all include a Qualcomm DIAG port which speaks a protocol using the standard HDLC framing with CRC-16 (polynomial 0x8408 and seed of 0xFFFF) and a frame terminator of 0x7E. So you’d think they’d re-use those bits. THINK AGAIN. Perhaps because they woke up one day and decided to make life hard for everyone on the planet, the Pantech engineers working on the UML290 decided to use a CRC-16 initial seed of 0xAAFE. What the fuck? Even the PC5740 and the PX-500 use a standard HDLC CRC-16 seed of 0xFFFF like just about everything else on the planet.

But it gets better. The responses from the UML290 don’t bother to include a valid CRC-16; instead it’s just 0x3030. Wow, class work guys. I’m sure there’s good reason for that. Or not. At least the PC5740 and PX-500 get points for valid CRCs.

Which begs the question: why do people still use these serial protocols? Every other piece of USB-connected wireless hardware I’ve seen, from WiFi devices to WiMAX cards, don’t bother with this serial framing shit at all. Even for firmware uploads. They just push packed structs up and down the wire. USB already has a 16-bit CRC check for data packets. Let’s re-invent the wheel for no good reason just because it’s fun.

Why do mobile broadband modems have to be different? Why all the framing and escaping and general eye gouging with shards broken glass? Why duplicate what USB already does? If your modem doesn’t use USB, doesn’t that protocol already have integrity protection and error checking? Cause if it doesn’t you’re already in for a world of hurt.

As an embedded engineer you just have to wake up one morning and say “This is fucking stupid.” But I suppose that’s not something a 6-month product cycle allows. Which is why, as open-source engineers that have to talk to hardware, we tend to drink. And then cry a lot.

NetworkManager and Dual-stack Addressing

Dodge the pig! (via the|G|™ under CC BY-NC-ND 2.0)

The big reason that NetworkManager 0.9 is slower to connect than NM 0.8 is that we flipped IPv6 addressing on by default. That means that when you connect to a new network and that network supports IPv6 autoconfiguration via router advertisements you’ll get IPv6 connectivity. But if that network doesn’t support IPv6 then you’ll spin for 60 seconds or so waiting for a router advertisement because there’s nothing on the network that listens to the IPv6 autoconf solicitations that the kernel puts out when the link comes up. You can fix that but changing the IPv6 addressing method to “Ignore” in nm-connection-editor if you know your network doesn’t support IPv6.

Why don’t we bring up IPv4 and just wait for IPv6 to happen in the background? That’s a great question; I’m glad I asked it. First, it requires some small changes in NetworkManager’s D-Bus interface to add connected states for both IPv4 and IPv6 simultaneously so that applications can listen for when each stack’s connectivity is available. That’s trivial. It could be done tomorrow. It’s not a technical problem at all.

But second, it requires applications to be smarter about what resources they require and to do smart things when those resources aren’t available. And that apparently happens when solid gold pigs start dropping out of the sky. I hope you have falling-gold-pig insurance for your car. But app authors often don’t make their applications smarter and more network aware because hey, that’s more work for them, and hey, people haven’t requested this yet, and hey, that’s one more D-Bus API I need to depend on and I don’t know what else.

NetworkManager says it’s connected via a global “State” property. That property is a logical OR of both IPv4 and IPv6 connectivity. If one is connected then the State property is NM_STATE_CONNECTED. Great, right? But if NM flips the state to CONNECTED when IPv4 completes but IPv6 is still waiting, then your favorite IRC application will try to connect to your IPv6-enabled IRC server. Except IPv6 isn’t up yet so it fails. And you get mad because shit doesn’t magically work.

And then what happens if IPv6 fails? Do we fail the entire connection? Or do we just keep listening for IPv6 router advertisements and when one comes in configure the interface? Currently there’s a setting called ‘failure fatal’ for both IPv4 and IPv6 that lets you determine that behavior; it defaults to TRUE for IPv4 and FALSE for IPv6 since so many networks don’t yet have IPv6 enabled. But this really is something we shouldn’t have to care much about.

And that brings us back to applications. When NetworkManager adds dual-stack connected state, which is actually pretty trivial to do, the applications have to listen to that and care so that your life is better. If the app has an IPv6 address and NM indicates that IPv6 isn’t yet available, the app needs to wait until NM says it is available. Same for IPv4. The problem is that nobody ever seems to bother with this sort of intelligence at the application level, but that’s where it’s really needed, since the connection manager has no idea what servers you’re connecting to and whether or not they are IPv4 or IPv6.

As a side rant about application intelligence, apps should also allow you to associate resources (like internal VPN-only mail servers) with NetworkManager VPN connection UUIDs so that they only check the mail on your corporate VPN when NM says your VPN connection is up. You can do that now. It’s been there for years. But nobody bothers to write that sort of useful support into applications either. Where does the application’s responsibility for intelligence begin? Useful insights on where that line gets drawn are most welcome. So are comments about how hot Colin Walter’s mom is.

gnote performance

I’ve been using gnote as my daily job status tool for a few years now, and it’s great. I love it. I have 900+ notes. But every day when I create a new note it hangs for 10 seconds, and again after typing the note’s title and hitting return. This machine isn’t slow (Core 2 Duo 1.86GHz) so it’s got to be gnote.

So we fire up sysprof. And for both operations (creating a new note, changing the title) we find the culprit to be the add_keyword() function, called from gnote::TrieController::update(). It appears to be mostly add_match_at_state() checking for equality of something. Full sysprof data available upon request.

I like gnote a lot; this is a minor annoyance but one I hit every day. If anyone optimizes this I will owe you something, and I’m a great person to have owing you something.

Fedora 15 Throws a Party

In case you missed it, Fedora 15 got released today. It’s packed with tons of cutting-edge features but most of all, it includes GNOME 3, KDE 4.6, XFCE 4.8, NetworkManager 0.9, btrfs integration, better power management, LibreOffice, Firefox 4, systemd, and a ton of other stuff you’ll love. Read the release notes, download it, and start living the life you always dreamed of.

NetworkManager 0.9, Pidgin, and tinc

Pidgin

As a reply to Andrew’s comments about NM 0.9 and Pidgin, I wrote patches a while back of which one got commited and a second is pending.

tinc and VPN plugins

Andrew also talked about tinc and how he’d love if it had NetworkManager integration.

NetworkManager expects quite a bit out of VPN services; they cannot simply be dumb services that expect everything to be statically configured for every user on the system. Why? Because NetworkManager allows many different configurations of VPN setttings; you might have one VPN for your cover-story workplace and one for your Secret Three Letter Agency that you only use in secure locations. That configuration is stored in NM config files in /etc and includes not just VPN-specific configuration, but also IPv4 and IPv6 configuration, static routes, DNS and search domain information, and a human-readable name and connection UUID. This allows the user to override configuration the VPN might automatically return. In the future we’ll add proxy configuration and firewall rules to that list. Because all these things are highly specific to a single network connection (be that VPN, wifi, wired, 3G, whatever), they need to be kept together, changed together, and applied together. No existing VPN configuration file format supports all this. But NetworkManager does.

This means that we cannot simply use /etc/openvpn.conf or /etc/tinc/tinc.conf because

standard config files often contain only one network: they are essentially “public” configuration files and the concept falls apart if you have ever configured more than one VPN; while some VPN daemons do have formats that allow defining more than one network, many do not.
config files cannot encode related connection information: there is often no facility for expanded network-specific configuration like proxies, firewall rules, additional IP addresses, static routes, DNS search domains, etc that should be associated with VPN connection.
secrets should be stored securely: if the user wants secure password storage in the GNOME Keyring or KWallet or whatever, they should be able to do so. The user should be able to keep the password in their session or even provide it on-demand and not require it to be stored in system configuration files.
secrets can change periodically: at Red Hat we use RSA SecurID tokens that generate a new PIN code every 30 seconds which is entered every time we connect. Many VPN daemons will ask for passwords too, but that requires a terminal. Fail. We want to ask for secrets in a generic manner which is appropriate to each desktop environment (or lack thereof), and existing VPN secret request mechanisms (stdin, TCP management socket, static config files, etc) simply do not allow this.

To work around these limitations of configuration files, NetworkManager dynamically generates configuration for each VPN daemon and inserts your password when required, retrieved from secure GNOME Keyring/KWallet storage or from a PIN entry dialog or other mechanism. The VPN daemon is then executed and handed that configuration, either a path to a private, root-owned, transient configuration file or, even better cleanly written to stdin if the VPN daemon supports it.

Which leads me to tinc. Nothing appears to preclude creation of NetworkManager VPN plugin for tinc, but there are some complications that it would be great to get fixed upstream:

quite a few configuration files required for each VPN network, and a plugin would have to create all these files dynamically before executing tincd; it appears that tinc 1.0.14 allows arbitrary config options on the command-line, which helps somewhat, but even better would be accepting configuration on stdin as a single unit instead of a bunch of separate files. This way no config files (possibly including secrets) might mistakenly get left lying around due to segfaults or programming errors.
configuration appears to require an explicit device name (like “tun0”) which is a huge no-no; if the program can’t dynamically determine a suitable device name and return that to the caller, it gets a F- grade from me. If the user configures more than one VPN that they might use concurrently, they shouldn’t have to manually plan out interface names. At least it appears that tinc sends the interface name to the “up” script in the INTERFACE environment variable.
like OpenVPN, it appears that many attributes of the VPN connection cannot be auto-detected, which requires the user to know a-priori what the VPN configuration will be. Stuff like “Cipher”, “Compression”, “Digest”, etc. This never helps users and apparently everybody writing VPN software thinks the user of their software is already a system administrator. I hope I’m wrong about this. If I’m not, hopefully tinc emits status information indicating that the parameters set in configuration are incompatible with the peers it’s trying to connect to such that we can notify the user about it.
it’s unclear to me how tinc reports status and progress in a usable manner; it appears that one can send signals to tincd, but they dump information to syslog. Ideally tincd would include an option to dump this information to stdout as well, because screen-scraping syslog is just completely evil.

None of these issues are killers; but they simply result in a degraded experience for the user of tincd if that user is not a system administrator. At this point vpnc is the best-behaved VPN daemon because it (a) accepts configuration on stdin, (b) can request secrets dynamically via stdin, (c) automatically negotiates most options with the peer, and (d) doesn’t have 50,000 configuration options with complex interdependencies. I hope tinc can get there too.

If anyone wants to write a NetworkManager VPN plugin for tinc, definitely let me know or jump onto the mailing list and we’d be glad to help out with suggestions and advice.