dhclient infinite lease follow-up

Further analysis clarifies some of the actual bugs in dhclient/dhcp code.  The ‘infinite lease’ problem only happens on 64-bit hosts due to the size of time_t.  Plus the problem isn’t just an ‘infinite’ lease issue, it’s a ‘infinite lease minus seconds since the epoch’ problem due to how the dhclient code internally handles timeout calculation.  Remember, an ‘infinite’ lease is 0xFFFFFFFF, or UINT_MAX.

On IA-32, where time_t is 32-bit:

  • after the lease is ACK-ed, some code adds the lease seconds to gettimeofday() seconds to determine the lease expiry time, which of course is now wrapped
  • this same code then checks if the lease expiry is less than gettimeofday() seconds (which of course it is, since it just wrapped around to gettimeofday() – 1) and helpfully resets the lease expiry to INT_MAX
  • checks in isc_time_nowplusinterval() which guard against 32-bit overflow (by checking expiry seconds plus gettimeofday() against UINT_MAX) succeed and everything works

On x64, where time_t is 64-bit:

  • adding gettimeofday() seconds to the lease seconds does not wrap, thus the lease expiry is larger than UINT_MAX
  • since the value did not wrap, the checks for negative lease expiry do not trigger, and the expiry is still UINT_MAX + gettimeofday()
  • the overflow checks in isc_time_nowplusinterval() trigger because our values are already larger than UINT_MAX, and the lease fails

The core problem is that most of the dhcp/dhclient code simply doesn’t deal with large time values that may occur on 64-bit platforms.  A few suggestions for dhclient:

  1. Change the TIME define to ‘u_int64_t’ instead of ‘time_t’; that makes the relevant client_lease structure fields large enough to survive a wrap
  2. Change the fields of struct isc_time_t to ‘u_int64_t’ instead of ‘unsigned int’; maybe even change ‘struct isc_interval_t’ to use 64-bit values just to ensure wrapping doesn’t happen there too
  3. Stop using ‘struct timeval’; instead use the fixed ‘struct isc_time_t’
  4. Wrap gettimeofday() in a utility function that returns a ‘struct isc_time_t’ instead of ‘struct timeval’
  5. Then you don’t need the 32-bit checks in isc_time_nowplusinterval() or at least you could change them to be 64-bit safe instead.

For Fedora, fixed test packages are available here.  Not sure about other distros.

GUADEC PSA: WiFi fails due to dhclient lease miscalculation

The DHCP server at GUADEC apparently gives out infinite leases, which have a value of 0xFFFFFFFF (in seconds).  Leaving aside the question of whether infinite leases are actually a good idea at a conference where many people come and go (hint: they usually aren’t),  why won’t it just @#%@#%!&&* connect?

By default NetworkManager uses dhclient for DHCP, and that usually works fairly well.  It’s a well-understood program developed by the ISC that’s used by millions every day.  But dhclient apparently fails with infinite leases.  Here’s the dhclient code:

    #define DHCP_SEC_MAX  0xFFFFFFFF

/*
* The ISC timer library doesn’t seem to like negative values
* and can’t accept any values above 4G-1 seconds so we limit
* the values to 0 <= value < 4G-1.  We do it before
* checking the trace option so that both the trace code and
* the working code use the same values.
*/

sec  = when->tv_sec – cur_tv.tv_sec;
usec = when->tv_usec – cur_tv.tv_usec;

<…>
} else if (sec > DHCP_SEC_MAX) {
log_error(“Timeout requested too large ”
“reducing to 2^^32-1”);
sec = DHCP_SEC_MAX;
<…>
}

isc_interval_set(&interval, sec & DHCP_SEC_MAX, usec * 1000);
status = isc_time_nowplusinterval(&expires, &interval);
if (status != ISC_R_SUCCESS) {
log_fatal(“Unable to set up timer: %s”,
isc_result_totext(status));
}

The code attempts to add a timeout that triggers when the lease expires; “when” is the lease interval coming from the DHCP server (which is 0xFFFFFFFF, remember).  “cur_tv” is just gettimeofday().  Let’s enumerate the fail purely for pedagogic purposes:

  • Despite the comment and logged error, the code makes no attempt to limit the value to UINT_MAX – 1.  Even if it did, this wouldn’t help; see below.  So now we’re passing UINT_MAX into isc_time_nowplusinterval() as interval->seconds.  Condensed code for that function:

isc_result_t isc_time_nowplusinterval(isc_time_t *t, const isc_interval_t *i) {
struct timeval tv;

if (gettimeofday(&tv, NULL) == -1)
return (ISC_R_UNEXPECTED);

/* Ensure the resulting seconds value fits in the size of an
* unsigned int.  (It is written this way as a slight optimization;
* note that even if both values == INT_MAX, then when added
* and getting another 1 added below the result is UINT_MAX.)
*/
if ((tv.tv_sec > INT_MAX || i->seconds > INT_MAX) &&
((long long)tv.tv_sec + i->seconds > UINT_MAX))
return (ISC_R_RANGE);

<…>

  • tv.tv_sec could be greater than INT_MAX since time_t is often 8 bytes wide.  So this code is much more likely to fail in early 2038.
  • Oops!  i->seconds is UINT_MAX, and clearly that’s larger than INT_MAX.  So onward to the next bit.
  • Oops I did it again!  I played with your timer!  The second check passes (unless you have a time machine and you like 1970) because tv.tv_sec is the current time (clearly a large value) and i->seconds is already UINT_MAX.
  • An error is returned, and your lease fails.

Looks like this code just doesn’t expect to deal with large values of Unix time; hopefully that’ll get corrected soon.  The Red Hat bug report for this is bug 662254.

What can I do?

NetworkManager supports two different DHCP clients, selectable at runtime.  Install dhcpcd, set dhcp=dhcpcd in the [main] section of /etc/NetworkManager/NetworkManager.conf, restart NM, and voila, DHCP.