burning cpu and battery on the gnome desktop

last night seb128 asked me for advice on how he could tell if gnome-vfs was using inotify or gamin as its backend for file monitoring. i didn’t really know, so i suggested stracing gnomevfs-monitor.

this is when i found out that the inotify code in dapper’s gnome-vfs wakes up 10 times a second. i understand this to be because it does some polling for non-existant files and also has an internal timer to match move_from and move_to events.

this is a problem. 9 applications on my system right now are using file monitoring and suffering this problem. list is as follows:

  • panel
  • nautilus
  • settings-daemon
  • ephy
  • update-notify
  • vfs-daemon
  • seahorse-agent
  • muine
  • deskbar

this alone causes my system to wake up 90 times per second.

at guadec i also talked to mclasen about another bug. gtk in dapper wakes up whenever a mouse button is clicked or a modifier key is pressed. this means that every single application on your desktop wakes up every time you press or release shift. ow. mclasen fixed this bug within a very impressive amount of time and the fix will be in gtk 2.10. dapper users are left burning cpu for the time being.

i’ve been going around stracing random pid’s of desktop processes. i’ve found some other offenders.

  • gnome-power-manager wakes up twice per second to do something
  • battery applet wakes up once per second to do something
  • clock-applet wakes up once per second to update the time even when seconds aren’t shown
  • gajim wakes up 10+ times per second for some unknown reason
  • at-spi-registryd wakes up more like 20+ times a second (?!?)
  • gss seems to talk to x11 once per second (presumably to ask if anything has happened). i don’t understand why it has to do this so often.

so all in all gnome is causing my computer to wake up about 200 times a second when totally idle. since this is happening in a lot of different processes, that’s one heck of a lot of context switches.

i did a check. in a loop (with sleep 10s) i catted the contents of the acpi battery state in /proc. i did this with gnome running and with a failsafe x session with xterm. gnome causes my laptop to use a couple of watts more power.

i also was talking to mjg59 about this earlier today. according to him, circa 2.6.19 we’ll see a new kernel feature by which the ‘hz’ timer that runs at a constant rate will be disabled. the idea is that we can use the new hightech timers we have to schedule timer interrupts only when needed. this means that except when in userspace (for preemption purposes) or waiting for some specific event the timer will be disabled and the cpu will be totally idle until the next irq comes in.

if we can reduce the number of wakes-up the desktop does (to say, zero times per second) then we can reduce the number of times the kernel has to wake the cpu up at all. even without the new timer tech there’s a noticable (albiet small) change in battery consumption. with the new timers, not waking up will become increasingly important.

this post is a plea. please fix your apps to never wake up unless the user does something to them. my laptop’s battery will thank you.

ps: the best way to find out how often your app is running is to use strace. you have two options here. you can either run your application directly under strace as follows:

strace yourapp

or you can attach strace to a running application by its pid:

strace -p `pidof yourapp`

if you see poll( or select( and a long wait then your app is sleeping (good!). if you see lots of activity when you’re not otherwise interacting with your application then you’ve got trouble. fix it!

23 thoughts on “burning cpu and battery on the gnome desktop”

  1. Ryan, one of my best discoveries of this GUADEC!

    I’m no performance expert at all, just another user suggering the effects of non-optimized performance in my laptop. I like to read posts like this one of yours, though. Like I enjoy reading Ben Maurer’s or Federico’s.

    Compared to releasing cool new features, optimization could be seen as a pretty boring and unnoticed effort. But you’re my heroes! Keep the good work going.

    A happier user.

  2. this work is appreciated. performance is the thing were GNOME needs the most love right now… it’s great that GNOME came that far!

  3. I haven’t looked at the code but clock-applet is probably using the same ~1s timer to reduce delay in updating the minutes. Perhaps it could sync up to the next minute update then switch to a 60s sleep?

  4. Thanks for this post. I shall keep this list and look out for the patches :)

    I tried stracing a very trivial GLADE-powered application I wrote (in ruby). It is going through an ioctl/gettimeofday/select loop when idle. The code I have written is so trivial, I’m thinking that there must be some bad stuff in either ruby or glade.

  5. Excellent work! This kind of info and optimization is also probably going to be vital to the olpc effort.

  6. For debugging these things, you should aybe give dtrace on opensolaris a try.

    Nice work!

  7. Great analysis!, I hope this post becomes a future GnomeGoal or SoC project, because these small(in appearance) details do really make a difference in the long term and wrt other desktops.

  8. I’m amazed to see such good work going into optimization for GNOME, keep it up! :)

    Out of curiosity, I tried stracing specto, and I don’t know how I should interpret the result. This application uses gobject timers to determine when to check for updates. I have thousands of lines like this scrolling up, because it seems to calculate the unix time all the time:

    ioctl(15, FIONREAD, [0]) = 0
    gettimeofday({1154014114, 834950}, NULL) = 0

    Should I be worried, or this is just plain normal? I noticed gaim has those, but MUCH less frequently.

  9. kiddo; if you see lines after lines of those two things (without a poll, sleep, select, etc) in between them then you have trouble indeed. this means that your application is running in an infinite loop and never sleeping at all (ie: is pegging the cpu at 100%).

    unless your app is actively doing work it shouldn’t be doing this.

  10. Since specto is a python app, it might not behave the same way using strace as other applications might. I tried running the python shell through strace and it behaved in a similar manner.

  11. major, major +1. fwiw, there’s another easy way to spot this — check the CPU time each process has accumulated in “top”. If you see massive CPU time counts for long-running desktop processes that should not be CPU-intensive, there’s probably a problem and an strace is called for.

    in KDE, “kicker” looks like it has a similar problem — on my laptop, kicker has nearly as many CPU seconds racked up as Xorg!

  12. Since so many of our useful apps are written in Python (and Mono), we should probably get those to stop waking up all the time, too.

    Does a Python shell (or other program) that’s not doing anything need to keep calling these? And more importantly, if it needs to be fixed in Python itself, and we submit a patch, how much will they love us? :-)

  13. Feel free to try Totem out with that. I’m pretty sure there are some low-hanging fruits in terms of optimisations in the front-end code.

  14. Hey Ryan,

    g-p-m has to poll the DPMS extension to detect changes. Yeah, it sucks but there doesn’t seem to be another way outside of hacking Xorg and possibly the DPMS ext. spec. Whether we need to poll twice per second or not is debatable I suppose.

    g-s-s has to poll the X pointer position to see if it changes. The timeout used to be higher but we reduced it so that state changes are executed closer to the action that triggers them. On the todo list is making the timeout variable – making is shorter only when higher resolution is required. This will be done before 2.16.

  15. I agreee that this stuff is very important and it’s also an area where quite a bit of expertise is required.

    Would there be any chance of the GNOME Foundation sponsoring a few plane tickets for Ryan, Federico and Ben to get together and hack for a week?

  16. Some further analysis of my todo program, and some thinking, are leading me to believe that it’s ruby in my case, not glade. I tried stracing irb (ruby’s interactive shell) and it repeats ‘select(1, [0], NULL, [0], {0, 100000}) = 0 (Timeout)’ on sub-second intervals.

  17. I was taking a look at Gnome’s trash-applet in Ubuntu Dapper. (Gnome 2.14.2). It’s doing similar to what kiddo saw, non-stop scrolling of:

    poll([{fd=17, events=POLLIN}, {fd=3, events=POLLIN}, {fd=26, events=POLLIN POLLPRI}, {fd=40, events=POLLIN}], 4, 121) = 0
    gettimeofday({1154027422, 791699}, NULL) = 0
    ioctl(3, FIONREAD, [0]) = 0

    I went into the code and wrapped all the function’s in printf’s, it doesn’t look like it’s getting stuck in any of trash-applet’s functions. Am I seeing something from GTK or maybe inotify polling?

  18. Please keep on with this performance work for Gnome – it’s vital for the people who can’t afford to buy the latest CPU and bigger RAM.

  19. #!/usr/bin/env python
    #small demo prog to demonstrate pygtk’s
    #brokeness in this regard.

    import os,gtk
    print “run this in another terminal: strace -p”, os.getpid()

  20. Pádraig: I tried your sample app. The syscall loop looks like this:

    1154237742.279312 gettimeofday({1154237742, 279631}, NULL) = 0
    1154237742.280230 poll([{fd=5, events=POLLIN}], 1, 94) = 0
    1154237742.374099 gettimeofday({1154237742, 374572}, NULL) = 0
    1154237742.377848 ioctl(5, FIONREAD, [0]) = 0
    1154237742.378800 gettimeofday({1154237742, 378867}, NULL) = 0
    1154237742.378949 poll([{fd=5, events=POLLIN}], 1, 95) = 0

    ad nauseam.

    I checked file descriptor 5 on /proc:
    $ ls -l /proc/29329/fd/5
    lrwx—— 1 alvherre alvherre 64 2006-07-30 01:36 /proc/29329/fd/5 -> socket:[438302]

    So the problem is being caused by repeated polls to socket 438302. What’s that anyway? Netstat tells me

    unix 3 [ ] STREAM CONNECTED 438303 /tmp/.X11-unix/X0
    unix 3 [ ] STREAM CONNECTED 438302

    While I don’t really know how to correlate the two in a scientific manner, I very strongly suspect that socket 438303 is the remote end of the socket.

    So the problem here seems to be that X is setting that socket to “readable” or whatever, many times per second. Is this an X bug? I don’t know.

    I checked something else. I ran the strace with the -T switch. This allowed me to notice that the gettimeofday() call sometimes is incredibly slow. I knew this already due the PostgreSQL-related experiments, so it didn’t surprise me. But it may be useful information for you:


  21. alvaro: this isn’t socket 5’s fault. it is normal for a process to be polling on a socket like this. notice:

    poll([{fd=5, events=POLLIN}], 1, 95) = 0

    the ’95’ is the problem. it says that poll should wait for a maximum of 95 milliseconds before waking up again. therefore the program is waking up quite often.

Comments are closed.