NetworkManager and WiFi Scans

Recently Dave Täht wrote a blog post investigating latency and WiFi scanning and came across NetworkManager’s periodic scan behavior.  When a WiFi device scans it obviously must change from its current radio channel to other channels and wait for a short amount of time listening for beacons from access points.  That means it’s not passing your traffic.

With a bad driver it can sometimes take 20+ seconds and all your traffic gets dropped on the floor.

With a good driver scanning takes only a few seconds and the driver breaks the scan into chunks, returning to the associated access point’s channel periodically to handle pending traffic.  Even with a good driver, latency-critical applications like VOIP or gaming will clearly suffer while the WiFi device is listening on another channel.

So why does NetworkManager periodically scan for WiFi access points?

Roaming

Whenever your WiFi network has multiple access points with the same SSID (or a dual-band AP with a single SSID) you need roaming to maintain optimal connectivity and speed.  Jumping to a better AP requires that the device know what access points are available, which means doing a periodic scan like NetworkManager does every 2 minutes.  Without periodic scans, the driver must scan at precisely the worst moment: when the signal quality is bad, and data rates are low, and the risk of disconnecting is higher.

Enterprise WiFi setups make the roaming problem much worse because they often have tens or hundreds of access points in the network and because they typically use high-security 802.1x authentication with EAP.  Roaming with 802.1x introduces many more steps to the roaming process, each of which can fail the roaming attempt.  Strategies like pre-authentication and periodic scanning greatly reduce roaming errors and latency.

User responsiveness and Location awareness

The second reason for periodic scanning is to maintain a list of access points around you for presentation in user interfaces and for geolocation in browsers that support it.  Up until a couple years ago, most Linux WiFi applets displayed a drop-down list of access points that you could click on at any time.  Waiting for 5 to 15 seconds for a menu to populate or ‘nmcli dev wifi list’ to return would be annoying.

But with the proliferation of WiFi (often more than 30 or 40 if you live in a flat) those lists became less and less useful, so UIs like GNOME Shell moved to a separate window for WiFi lists.  This reduces the need for a constantly up-to-date WiFi list and thus for periodic scanning.

To help support these interaction models and click-to-scan behaviors like Mac OS X or Maemo, NetworkManager long ago added a D-Bus API method to request an out-of-band WiFi scan.  While it’s pretty trivial to use this API to initiate geolocation or to refresh the WiFi list based on specific user actions, I’m not aware of any clients using it well.  GNOME Shell only requests scans when the network list is empty and plasma-nm only does so when the user clicks a button.  Instead, UIs should simply request scans periodically while the WiFi list is shown, removing the need for yet another click.

WHAT TO DO

If you don’t care about roaming, and I’m assuming David doesn’t, then NetworkManager offers a simple solution: lock your WiFi connection profile to the BSSID of your access point.  When you do this, NetworkManager understands that you do not want to roam and will disable the periodic scanning behavior.  Explicitly requested scans are still allowed.

You can also advocate that your favorite WiFi interface add support for NetworkManager’s RequestScan() API method and begin requesting periodic scans when WiFi lists are shown or when your browser uses geolocation.  When most do this, perhaps NetworkManager could be less aggressive with its own periodic scans, or perhaps remove them altogether in favor of a more general solution.

That general solution might involve disabling periodic roaming when the signal strength is extremely good and start scanning more aggressively when signal strength drops over a threshold.  But signal strength drops for many reasons like turning on a microwave, closing doors, turning on Bluetooth, or even walking to the next room, and triggering a scan then still interrupts your VOIP call or low ping headshot.  This also doesn’t help people who aren’t close to their access point, leading to the same scanning problem David talks about if you’re in the basement but not if you’re in the bedroom.

Another idea would be to disable periodic scanning when latency critical applications are active, but this requires that these applications consistently set the IPv4 TOS field or use the SO_PRIORITY socket option.  Few do so.  This also requires visibility into kernel mac80211 queue depths and would not work for proprietary or non-mac80211-based drivers.  But if all the pieces fell into place on the kernel side, NetworkManager could definitely do this while waiting for applications and drivers to catch up.

If you’ve got other ideas, feel free to propose them.