NetworkManager – Page 3 – Dan Williams’ blog

Yo Berlin!

Lock up your booze and your network cards (in that order), I’m hitting up Berlin for Desktop Summit. I’ll be talking about network and location awareness in your application on Sunday, which is thinly veiled code for how to make NetworkManager and ModemManager tell you where you are and how to get where you want to go. I’ll also be hosting a BOF (with Will Stephenson, hopefully) on Wednesday afternoon in which you can alternatively deride and praise networking on Linux. If you don’t attend, I will be supremely disappointed and you can be assured WiFi access points will shun your feeble attempts at association. Otherwise let the summitting begin.

DIDO?

No, not her. But Distributed-In-Distributed-Out. I’ve often thought that current cellular-derived systems (CDMA, EVDO, UMTS, LTE, etc) were insanely complex at the radio/protocol level. WiMAX is less complex than the gigantic hairball of UMTS/LTE that all the telcos coughed up since it comes from the IEEE instead of the ETSI/3GPP groups, but it’s certainly not simple. I mean, look at the AT command specifications for UMTS or LTE; there’s just so much there for setting up bearers for this and bearers for that, QoS for whatever, latency requirements, etc. I can’t imagine having to program a radio protocol stack like the team at OpenBTS is doing. It’s all there because the radio channel is shared spectrum and voice calls are still the most important part. If you can’t make a voice call because some douchebag is watching Youtube, you’d be pissed. And for whatever reason they still haven’t figured out how to reliably do VOIP over 4G networks leading to stuff like Circuit Switched Fallback and (for Verizon) using the CDMA 1x network for voice and the LTE network for data. Wouldn’t it be great to keep things simple?

And that’s where Rearden and OnLive come in. Over the past 10 years they decided to throw out everything the ETSI, 3GPP, and 3GPP2 think they know about wireless, and rebuild it from the ground up. All because they need a really low-latency, cheap, reliable wireless medium to play games over. And I hope they make it work because it would really disrupt the existing wireless incumbents with their layers upon layers of protocols and complexity and crap and eye-bulging prices for wireless data. And the fact that it appears so freakishly simple on the client side makes my life easier since we don’t have to do all sorts of stupid setup just to send a single IP packet over the network. Here’s to the future…

PSA: GtkBuilder, toplevels, and gtk_widget_destroy()

So this has nailed me twice and maybe this time I’ll remember. If you have a toplevel (GtkWindow, GtkDialog, etc) in a GtkBuilder file, and you load that file into a GtkBuilder object, you need to remember to explicitly call gtk_widget_destroy() on it. GtkBuilder will sink the initial GTK floating ref for you, but that means you now have widget with 2 references (object creation and the ref sink) and getting rid of the GtkBuilder will only remove one of those references for you. You then need to remember to call gtk_widget_destroy() to get rid of the other one. Not g_object_unref() apparently, as that’ll cause segfaults somewhere later on during widget destruction when something tries to disconnect some signal handlers somewhere, but gtk_widget_destroy(). This also removes the toplevels from GTK’s “toplevel_list” which, if you’re not careful and forget to destroy it, can lead to segfaults later when GTK tries to issue grabs when you’re scrolling. Those are always entertaining to track down. And when I say entertaining I don’t actually mean it.

GtkBuilder even has documentation about this:

For toplevel windows constructed by a builder, it is the responsibility of the user to call gtk_widget_destroy() to get rid of them and all the widgets they contain.

but somehow I keep forgetting. And then I waste and hour figuring out WTF is going wrong.

The Incredible Magical Pantech UML290

The Basics

Along with the LG VL600 this modem was the launch device for the Verizon 4G LTE network late last year. Despite being quite large (over twice the size of a normal 3G modem) it’s not a bad device and performs quite well in speed tests. Inside is a Qualcomm MDM9600 chipset providing both CDMA 1xRTT and EVDO on the standard North American 850 MHz Cellular and 1900 MHz PCS bands, and LTE on Verizon’s Upper 700 MHz C-block band. This device cannot roam internationally.

Linux Support

The UML290 exposes four USB interfaces: a standard CDC-ACM AT command port which supports PPP, a QCDM port, a WMC port, and a raw IP network port. Of these, only the AT command and the QCDM ports are really usable in Linux. You can connect to the LTE network using standard ETSI 27.007 GSM-style AT commands like AT+CGDCONT and ATD#99* and such. Connections to the 3G EVDO network can be made with the standard ATD#777 command. Unfortunately, the PPP functionality does not support data connection handoff between the EVDO and LTE networks, so you have to break the connection and reconnect with the appropriate ATD command when necessary. Why is that?

To allow seamless operation between the EVDO and LTE networks Verizon upgraded parts of their core network to eHRPD. HRPD (High Rate Packet Data) is the new name for HDR (High Data Rate) which was the old name for the IS-856 standard developed by Qualcomm ten years ago for high speed 3G packet data. EVDO (Evolution Data Only) is just the marketing name for all that. eHRPD stands for “evolved” or “enhanced” HRPD and essentially drops in pieces of the LTE core network modified to work with older EVDO protocols. Normally your device uses the eHRPD protocol when starting a data session since both the network and the modem support it. But when you use traditional CDMA PPP via ATD#777 the session is between pppd on your computer and the packet data gateway in the network, in contrast to GSM/WCDMA/LTE where the PPP session is only between pppd and the modem itself, not over the air. My theory here is that to maintain backwards compatibility or for some other reason, PPP data sessions using ATD#777 only allow HRPD, and thus handoffs between EVDO and LTE don’t work because the LTE side doesn’t like the older HRPD.

This leads to the problem where you, as the user, have to poke values into the NV_HDRSCP_FORCE_AT_CONFIG_I NVRAM item to manually switch between HRPD and eHRPD just to get connected. Why does this matter? Because the only way to connect to the EVDO network on Linux is with a direct PPP data session using ATD#777. That sucks.

All Hail WMC (wait, what?)

Hardware often makes me want to dress all in black, sit at the end of the bar, drink, and cry. Often Matthew Garrett is right there with me so at least I have company on my trip to black, black oblivion. The hope is that talking to the UML290 on the WMC port and using the modem’s native network interface makes this stupid handoff problem just go away because the modem firmware takes care of the data session protocols and handoffs when you’re not using direct PPP. But that means that we need to reverse engineer both the WMC protocol and the network interface. I’ll drink to that.

It turns out the network interface appears to just be passing raw IP packets over USB. At least that’s what the Windows USB traces tell me unless I’ve had to much Jacky D in which case they just look like Care Bears and rainbows. Qualcomm posted some driver patches for the “smd_rmnet” driver for Android devices that describe a “raw IP” mode for RMNET interfaces that lead me to believe I’m on the right track here. We’ll see.

The WMC bits are the best part though. This Pantech-specific (as far as I can tell) protocol that has been around at least since 2005 since I’ve got an Audiovox PC5740 that uses it and a Pantech PX-500 on Sprint that looks similar yet different. WMC is just another binary protocol; essentially encoding structs on the wire but with a bunch of stupid at the front and some idiot at the end. It’s got a frame start marker of 0xC8, except when there’s more shit at the front. It’s got a frame terminator of 0x7E, except when it doesn’t. It gets HDLC escaped, except when even control characters get escaped instead of just the escape characters. It’s got standard command numbers, except when it doesn’t.

The basic WMC frame starts with 0xC8. The PC5740 and the PX-500 both accept plain WMC requests like this. The UML290 on the other hand uses just about the most convoluted format I can think of. I’d really love to know why. I hope there’s a good reason. Instead the Verizon connection manager sends the WMC packet prefixed with “AT*WMC=”, then 0xC8, and then a bunch of binary data. And not only are the HDLC escape characters escaped, all control characters under 0x20 are escaped too. Even better, the request terminates with a 0x0D instead of the standard 0x7E. So you end up with something looking like this:

41542a574d433dc87d2a87b80d

and when all the framing and shit is removed, it comes down to a single byte: 0x0A. That’s it. Really. Why is this so hard? It’s USB for crying out loud. We’re not on serial links anymore where if somebody picks up the telephone downstairs you get a bunch of garbage in your XMODEM transfer.

It gets better. There’s a CRC-16 at the end, which is pretty standard with these sorts of binary modem protocols. Qualcomm writes the original firmware for all these modems anyway and they all include a Qualcomm DIAG port which speaks a protocol using the standard HDLC framing with CRC-16 (polynomial 0x8408 and seed of 0xFFFF) and a frame terminator of 0x7E. So you’d think they’d re-use those bits. THINK AGAIN. Perhaps because they woke up one day and decided to make life hard for everyone on the planet, the Pantech engineers working on the UML290 decided to use a CRC-16 initial seed of 0xAAFE. What the fuck? Even the PC5740 and the PX-500 use a standard HDLC CRC-16 seed of 0xFFFF like just about everything else on the planet.

But it gets better. The responses from the UML290 don’t bother to include a valid CRC-16; instead it’s just 0x3030. Wow, class work guys. I’m sure there’s good reason for that. Or not. At least the PC5740 and PX-500 get points for valid CRCs.

Which begs the question: why do people still use these serial protocols? Every other piece of USB-connected wireless hardware I’ve seen, from WiFi devices to WiMAX cards, don’t bother with this serial framing shit at all. Even for firmware uploads. They just push packed structs up and down the wire. USB already has a 16-bit CRC check for data packets. Let’s re-invent the wheel for no good reason just because it’s fun.

Why do mobile broadband modems have to be different? Why all the framing and escaping and general eye gouging with shards broken glass? Why duplicate what USB already does? If your modem doesn’t use USB, doesn’t that protocol already have integrity protection and error checking? Cause if it doesn’t you’re already in for a world of hurt.

As an embedded engineer you just have to wake up one morning and say “This is fucking stupid.” But I suppose that’s not something a 6-month product cycle allows. Which is why, as open-source engineers that have to talk to hardware, we tend to drink. And then cry a lot.

NetworkManager and Dual-stack Addressing

Dodge the pig! (via the|G|™ under CC BY-NC-ND 2.0)

The big reason that NetworkManager 0.9 is slower to connect than NM 0.8 is that we flipped IPv6 addressing on by default. That means that when you connect to a new network and that network supports IPv6 autoconfiguration via router advertisements you’ll get IPv6 connectivity. But if that network doesn’t support IPv6 then you’ll spin for 60 seconds or so waiting for a router advertisement because there’s nothing on the network that listens to the IPv6 autoconf solicitations that the kernel puts out when the link comes up. You can fix that but changing the IPv6 addressing method to “Ignore” in nm-connection-editor if you know your network doesn’t support IPv6.

Why don’t we bring up IPv4 and just wait for IPv6 to happen in the background? That’s a great question; I’m glad I asked it. First, it requires some small changes in NetworkManager’s D-Bus interface to add connected states for both IPv4 and IPv6 simultaneously so that applications can listen for when each stack’s connectivity is available. That’s trivial. It could be done tomorrow. It’s not a technical problem at all.

But second, it requires applications to be smarter about what resources they require and to do smart things when those resources aren’t available. And that apparently happens when solid gold pigs start dropping out of the sky. I hope you have falling-gold-pig insurance for your car. But app authors often don’t make their applications smarter and more network aware because hey, that’s more work for them, and hey, people haven’t requested this yet, and hey, that’s one more D-Bus API I need to depend on and I don’t know what else.

NetworkManager says it’s connected via a global “State” property. That property is a logical OR of both IPv4 and IPv6 connectivity. If one is connected then the State property is NM_STATE_CONNECTED. Great, right? But if NM flips the state to CONNECTED when IPv4 completes but IPv6 is still waiting, then your favorite IRC application will try to connect to your IPv6-enabled IRC server. Except IPv6 isn’t up yet so it fails. And you get mad because shit doesn’t magically work.

And then what happens if IPv6 fails? Do we fail the entire connection? Or do we just keep listening for IPv6 router advertisements and when one comes in configure the interface? Currently there’s a setting called ‘failure fatal’ for both IPv4 and IPv6 that lets you determine that behavior; it defaults to TRUE for IPv4 and FALSE for IPv6 since so many networks don’t yet have IPv6 enabled. But this really is something we shouldn’t have to care much about.

And that brings us back to applications. When NetworkManager adds dual-stack connected state, which is actually pretty trivial to do, the applications have to listen to that and care so that your life is better. If the app has an IPv6 address and NM indicates that IPv6 isn’t yet available, the app needs to wait until NM says it is available. Same for IPv4. The problem is that nobody ever seems to bother with this sort of intelligence at the application level, but that’s where it’s really needed, since the connection manager has no idea what servers you’re connecting to and whether or not they are IPv4 or IPv6.

As a side rant about application intelligence, apps should also allow you to associate resources (like internal VPN-only mail servers) with NetworkManager VPN connection UUIDs so that they only check the mail on your corporate VPN when NM says your VPN connection is up. You can do that now. It’s been there for years. But nobody bothers to write that sort of useful support into applications either. Where does the application’s responsibility for intelligence begin? Useful insights on where that line gets drawn are most welcome. So are comments about how hot Colin Walter’s mom is.

gnote performance

I’ve been using gnote as my daily job status tool for a few years now, and it’s great. I love it. I have 900+ notes. But every day when I create a new note it hangs for 10 seconds, and again after typing the note’s title and hitting return. This machine isn’t slow (Core 2 Duo 1.86GHz) so it’s got to be gnote.

So we fire up sysprof. And for both operations (creating a new note, changing the title) we find the culprit to be the add_keyword() function, called from gnote::TrieController::update(). It appears to be mostly add_match_at_state() checking for equality of something. Full sysprof data available upon request.

I like gnote a lot; this is a minor annoyance but one I hit every day. If anyone optimizes this I will owe you something, and I’m a great person to have owing you something.

Fedora 15 Throws a Party

In case you missed it, Fedora 15 got released today. It’s packed with tons of cutting-edge features but most of all, it includes GNOME 3, KDE 4.6, XFCE 4.8, NetworkManager 0.9, btrfs integration, better power management, LibreOffice, Firefox 4, systemd, and a ton of other stuff you’ll love. Read the release notes, download it, and start living the life you always dreamed of.

NetworkManager 0.9, Pidgin, and tinc

Pidgin

As a reply to Andrew’s comments about NM 0.9 and Pidgin, I wrote patches a while back of which one got commited and a second is pending.

tinc and VPN plugins

Andrew also talked about tinc and how he’d love if it had NetworkManager integration.

NetworkManager expects quite a bit out of VPN services; they cannot simply be dumb services that expect everything to be statically configured for every user on the system. Why? Because NetworkManager allows many different configurations of VPN setttings; you might have one VPN for your cover-story workplace and one for your Secret Three Letter Agency that you only use in secure locations. That configuration is stored in NM config files in /etc and includes not just VPN-specific configuration, but also IPv4 and IPv6 configuration, static routes, DNS and search domain information, and a human-readable name and connection UUID. This allows the user to override configuration the VPN might automatically return. In the future we’ll add proxy configuration and firewall rules to that list. Because all these things are highly specific to a single network connection (be that VPN, wifi, wired, 3G, whatever), they need to be kept together, changed together, and applied together. No existing VPN configuration file format supports all this. But NetworkManager does.

This means that we cannot simply use /etc/openvpn.conf or /etc/tinc/tinc.conf because

standard config files often contain only one network: they are essentially “public” configuration files and the concept falls apart if you have ever configured more than one VPN; while some VPN daemons do have formats that allow defining more than one network, many do not.
config files cannot encode related connection information: there is often no facility for expanded network-specific configuration like proxies, firewall rules, additional IP addresses, static routes, DNS search domains, etc that should be associated with VPN connection.
secrets should be stored securely: if the user wants secure password storage in the GNOME Keyring or KWallet or whatever, they should be able to do so. The user should be able to keep the password in their session or even provide it on-demand and not require it to be stored in system configuration files.
secrets can change periodically: at Red Hat we use RSA SecurID tokens that generate a new PIN code every 30 seconds which is entered every time we connect. Many VPN daemons will ask for passwords too, but that requires a terminal. Fail. We want to ask for secrets in a generic manner which is appropriate to each desktop environment (or lack thereof), and existing VPN secret request mechanisms (stdin, TCP management socket, static config files, etc) simply do not allow this.

To work around these limitations of configuration files, NetworkManager dynamically generates configuration for each VPN daemon and inserts your password when required, retrieved from secure GNOME Keyring/KWallet storage or from a PIN entry dialog or other mechanism. The VPN daemon is then executed and handed that configuration, either a path to a private, root-owned, transient configuration file or, even better cleanly written to stdin if the VPN daemon supports it.

Which leads me to tinc. Nothing appears to preclude creation of NetworkManager VPN plugin for tinc, but there are some complications that it would be great to get fixed upstream:

quite a few configuration files required for each VPN network, and a plugin would have to create all these files dynamically before executing tincd; it appears that tinc 1.0.14 allows arbitrary config options on the command-line, which helps somewhat, but even better would be accepting configuration on stdin as a single unit instead of a bunch of separate files. This way no config files (possibly including secrets) might mistakenly get left lying around due to segfaults or programming errors.
configuration appears to require an explicit device name (like “tun0”) which is a huge no-no; if the program can’t dynamically determine a suitable device name and return that to the caller, it gets a F- grade from me. If the user configures more than one VPN that they might use concurrently, they shouldn’t have to manually plan out interface names. At least it appears that tinc sends the interface name to the “up” script in the INTERFACE environment variable.
like OpenVPN, it appears that many attributes of the VPN connection cannot be auto-detected, which requires the user to know a-priori what the VPN configuration will be. Stuff like “Cipher”, “Compression”, “Digest”, etc. This never helps users and apparently everybody writing VPN software thinks the user of their software is already a system administrator. I hope I’m wrong about this. If I’m not, hopefully tinc emits status information indicating that the parameters set in configuration are incompatible with the peers it’s trying to connect to such that we can notify the user about it.
it’s unclear to me how tinc reports status and progress in a usable manner; it appears that one can send signals to tincd, but they dump information to syslog. Ideally tincd would include an option to dump this information to stdout as well, because screen-scraping syslog is just completely evil.

None of these issues are killers; but they simply result in a degraded experience for the user of tincd if that user is not a system administrator. At this point vpnc is the best-behaved VPN daemon because it (a) accepts configuration on stdin, (b) can request secrets dynamically via stdin, (c) automatically negotiates most options with the peer, and (d) doesn’t have 50,000 configuration options with complex interdependencies. I hope tinc can get there too.

If anyone wants to write a NetworkManager VPN plugin for tinc, definitely let me know or jump onto the mailing list and we’d be glad to help out with suggestions and advice.

NetworkManager 0.8.4 Knows How To Party

Get on up, get on up and DANCE (CC BY 2.0 via Robert Bejil)

Next in the long line of illustrious NetworkManager releases, out comes 0.8.4. It loves to party, and if you look close enough it’s busting a move just to the left of the dude with a star on his hat at its own release party. Hell yeah. It’s a great release and packs a lot of fixes and features, including but not limited to fixes for IPv6, DHCP, DNS, no longer touching /etc/hosts, WWAN, and VPN. Hot tarballs are here:

http://ftp.gnome.org/pub/GNOME/sources/NetworkManager/0.8/NetworkManager-0.8.4.0.tar.bz2
http://ftp.gnome.org/pub/GNOME/sources/network-manager-applet/0.8/network-manager-applet-0.8.4.tar.bz2
http://ftp.gnome.org/pub/GNOME/sources/NetworkManager-openconnect/0.8/NetworkManager-openconnect-0.8.4.tar.bz2
http://ftp.gnome.org/pub/GNOME/sources/NetworkManager-openvpn/0.8/NetworkManager-openvpn-0.8.4.tar.bz2
http://ftp.gnome.org/pub/GNOME/sources/NetworkManager-pptp/0.8/NetworkManager-pptp-0.8.4.tar.bz2
http://ftp.gnome.org/pub/GNOME/sources/NetworkManager-vpnc/0.8/NetworkManager-vpnc-0.8.4.tar.bz2

A huge thanks to everyone who helped out with this release: Jirí Klimeš, Michael Biebl, Mathieu Trudel-Lapierre, Ozan Ça?layan, Robby Workman, Mu Qiao, Pierre Ossman, Mikhail Efremov, Andrey Borzenkov, Karsten Hopp, Ionut Biru, Robert Piasek, Torsten Spindler, Richard Hughes, Radek Vykydal, Canek Peláez Valdés, Wulf C. Krueger, and anyone I’ve forgotten.

Now back to the 0.9 train. Shake it hard and keep it rockin’ people.

Read This: RAE Vodafone Lecture Series

If you’re at all interested in wireless technology, you should read this. The first lecture (from 2005) starts with a very approachable history of [W]CDMA and cellular technology, including some of the process that went into the technology. About the only way you can a record of the decisions, trade-offs, and design choices that went into technology these days is by being part of the working group or reading comments in IEEE discussions. That’s incredibly boring. The first lecture, on the other hand, is straight from the source.

Choice quote from Dr. Jacobs, and remember that at the time (2006) George W. Bush was president:

I was with the President of China about three years ago, we sat next to each other to do the informal chatting before the formal part started, and his first question was how many more generations did I think that Moore’s Law had to run. I don’t think the President of the US would have asked that!

– Dr. Irwin Jacobs (co-founder of Qualcomm)

(would Sarah Palin ask that if she were President? Hell no. She doesn’t know what actual laws are, let alone Moore’s Law…) But forget about Palin for a moment, this quote reminds me of a Forbes article I saw a while ago:

In China, eight of the top nine political posts are held by engineers. In the U.S., almost no engineers or scientists are engaged in high-level politics, and there is a virtual absence of engineers in our public policy debates.

Yay for America. Our politicians and our aspiring politicians suck. But this post isn’t about China or the US, it’s about a Royal Academy of Arts lecture. Go read that thing.