Yesterday, I was playing with software that eats memory. Lots of memory. It’s the 3D analysis software Volocity. I loaded a pretty big image (basically several M per Z, and then several hundreds Zs), and made the mistake of trying to rotate it along multiple axes and then letting it go at the result.
In the old days, when Linux was a shiny new and hip OS, the OOM would come in and kill your app – if you’re lucky. It may first kill OO.o with that very important spreadsheet that you were working on. Otherwise decent, in a way. Nowadays, your system usually trashes beyond any reasonable repair and a reboot is the only option. If you’re patient enough, there’s a ~30% chance that the app actually kills within 5 minutes (out of ~10 times that it happened to me over the past ~2 yrs – since then I’ve given up and just reboot, a reboot takes less than 5 minutes anyway).
<Advertisement>Meet the Mac</Advertisement>. It pops up a warning saying that my system is low on memory (and later on it complained about diskspace also). The operation in the software, which is getting kind of sluggish up to this point (swapping?), eventually aborted with a nice error dialog. It actually told me that I was out of memory. In addition, the OS gave me suggestions on applications to close so I could retry the operation. No data was lost at any time during the +/- 10 times that I re-tried this. For a geek, there is no way to describe the feeling when you see this. In short: when will GNOME have this? [*]
In the end, I had to quit Photoshop and free up to 5GB HD space so it could complete the operation. Of course, at that time I had moved to the graphical workstation that we have since it would do that in a few seconds.
[*] glib actually has provisions for this, such as
g_try_malloc()
instead of
g_malloc()
, but I doubt that any OSdesktop (through HAL?) interaction exists to tell me that I’m OOM and suggest apps to close when it happens. So to say, if it exists, they’ve done a good job hiding it, because I’ve never seen it.
I don’t think try_malloc is really the best answer… the program could fail on try_malloc… but then how would it let the user known? It has no memory left to open a dialog.
A better option would be for gnome-session, or some similarish program, to keep a global watch on memory and swap. When an application gets to a point that it is doing nothing but swapping, gnome-session could notice, nice it to maximum possible, or send SIGSTOP. That would freeze the program until gnome-session’s UI could handle it. Yes, it would swap, but there would still be swap available for gnome-session’s UI to work. That UI could identify the running application and ask you what you want to do.
The problem isn’t the use (or lack thereof) of g_try_malloc(). The problem is Linux, which won’t return an error from sbrk(2) and related APIs until all of physical RAM + swap has been completely exhausted (by default).
So even if you used g_try_malloc(), you’d see the identical behavior as your entire system slid into swap and became nigh unusable.
In theory, setting /proc/sys/vm/swappiness should control Linux’s ability to use swap (i.e. you could force everything to be in RAM), in which case g_try_malloc() should return NULL, but I haven’t heard of anyone actually trying this.
when os X notices than an application crashes three time in a short notice it will ask the user if it has to remove the user preferences for the application and start with default settings.
it can be very useful when a shiny new version of an application does not like the old settings.
Although it would be nice to have a GUI for this, note that you can manually trigger OOM in linux (alt gr + syst + f).
I agree completely with this post. The behaviour of OOM situations in Linux, is what annoys me the most in Linux. I cannot even remember seeing Windows become so unstable so you have to do a hard reset, because a (buggy) application started eating up all your memory. Yet in Linux, this happens to me now and then, and it really pisses me of. Kernel hackers, are you reading this? Please fix this stupid behaviour. Actually, I even prefer applications to be killed instead of trashing my system for half an hour, after which I still have to do a reboot, because nothing responds anymore 🙁
For IceWM on embbeded devices I once created a kernel patch asking IceWM which application to kill in OOM situation. So that hook is possible. Unfortunatly the kernel hackers did not pick it up.
I imagine we’re seeing two different things at work here
1) The application requiring a large amount of memory and doing something similar to g_try_malloc and handling this failing gracefully. This is up to the application authors, but I bet they only check it when they’re allocating large amounts of memory like you required for your operation, and it wouldn’t have been so graceful if it was a few K of memory that it couldn’t obtain. (*)
2) The OS having Low memory handling as well as out of memory handling. Low memory situations would be triggered before being completely out of memory and an application starting to deal with it. I imagine if you’d continued you’d have hit the OOM handler and things would have been killed.
FWIW the N770/800 handles low memory and software can listen for the D-Bus signal and act accordingly by disabling some features that would be memory intensive. So it is possible on linux, I guess we just need some sort of desktop integration.
(*) Actually, I’ve just put it in my Marlin TODO, to use g_try_malloc to allocate the temporary buffers that it uses for things as these can be up to 2 meg each.
Would HAL even be necessary? I think libgtop should be able to do everything you want (as gnome-system-monitor has everything which would be needed).
All it would take is a program (with a dialog already loaded so when memory is low it doesn’t get screwed out of space and cause the freeze), which monitors the memory use and the change in memory use and estimates the time until total consumption. When a certain small amount of memory is left, the application pauses processes that have been consuming memory or have very large memory consumption, blacklisting or whitelisting to prevent locking Xorg or something. (In gsm this is “Stop process”.) The dialog pops up telling you that you’re low on memory, these programs a suspect, and that you could kill them or pray the terminate nicely. Do what the user requests, and then resume the processes.
It doesn’t seem like it would be too terribly challenging but I would love to see this daemon.
Iain, I actually remember seeing this behaviour on the Nokia770 as well, I was impressed by that. That should definitely be brought back upstream, it is too useful to ignore.
And for the record, gnome does already alert you about low disk space.
I’ve recently investigated this problem for Swfdec, since I don’t like the idea that a file downloaded from the web can kill your desktop. However, it’s just not possible with current memory design. As Jonathan said, g_try_malloc will return a valid address even if you allocate 2 gigs of memory, the kernel will just not allocate it until you use it. That will generate a page fault and that page fault will cause oom. And in case of Swfdec, the loads of memory can also be lots of Pixmaps in the X server, which would be another problem…
So – as others have said – the only solution is to monitor real memory usage in the kernel and have it emit signals via dbus to make apps free caches or allow the user to kill some apps. (Don’t forget to make the kernel interrupt the memory-eating apps or that popup will take 3 minutes to appear because of swapping.)
But I guess the reason why noone has worked on this yet is that OOM is a case of one app misbehaving in 90% of desktop cases. So you just nuke that app from the oom killer and be happy. The problem with that is that in recent times the kernel tries everything to keep processes alive, which seems like a good thing but is very annoying when desktop users have to wait 10 minutes until the kernel kills an oom thumbnailer.
If having limited memory available at notification time is a real issue, perhaps the notification program could pre-allocate what was necessary to achieve the notification? I imagine it would be very little VM. Of course, it would probably have been swapped out when it was needed.