I spent some time last week and this week on fixing some performance issues in gobject. It started out with the patches in bug 557100, which seemed very useful. I cleaned up those patches a bit, wrote a serious performance test and did some additional optimizations.
These changes focus on speeding up creation of “simple” gobject, i.e. things that have no properties or implement any interfaces, etc. They are still important because being able to use gobject gives us lots of advantages like threadsafe refcounting, runtime type introspection, user-data, etc. Sometimes people avoid using gobjects for small things just because they are a bit more expensive than some homebrew struct, which is very sad. With these fixes we can get rid of some of that.
Another thing about gobject that has bothered me for some time is the handling of interfaces. GIO and other modern APIs are starting to use interfaces more and more, so its important that they work well. However, interfaces in gobject have a feature that most people are unaware of, namely that you can add interfaces to a class after the class/type has been initialized. This means that the list of interfaces a class implements must be protected by a lock, and this lock must be taken each time we e.g. check if an object implement an interface or cast to the interface to do a method call on it.
Additionally the interface lookup algorithm used in gobject uses a binary search on the sorted list of interfaces a class implements. Better approaches are possible, like the one used in gcj (described here) which allows constant time (O(1)) interface lookup.
In bug 594525 and 594650 I described these issues and posted patches that fix them.
I added all these patches to the gobject-performance branch in glib git, including the performance test I wrote. The performance improvements are pretty good:
- Construction speed for simple objects more than doubled, while the construction speed for complex object is not much affected (within one percent).
- Interface typechecks go from 52 to 95 million per second in the non-threaded case and from 12 to 95 if g_threads_init() has been called.
- Additionally the contention for typechecks in multiple threads goes to zero as you can see in the tests does by benjamin in bug 594525.
Awesome!
Great work! Any test yet on how much impact will these optimizations translate to in a GNOME environment to an end user?
Are these performances issues with gtk responsible for the considerable delay we have while change de title of a window in glade-3?
WOW! When will this land in a stable release?
it would be also interesting if you could compare gobject performance to other OO languages like c++ or objective C.
> it would be also interesting if you could compare gobject performance to other OO
> languages like c++ or objective C.
Very good idea !
@marco: I have no idea what you’re talking about. have you filed a bug against glade?
@pavel: no, it really wouldn’t be interesting.
@jegHegy: shaving off milliseconds from a common operation like a type check is going to affect every gobject-based library and application; in the case of lock-free interface type checks this would benefit threaded applications and libraries using threads.
oh, and by the way: great work, Alex. you’re a real hero 🙂
I think it’s amazing that people give a damn that GObject uses memory at all. We live in a world where the predominate desktop OS is covered in .Net. Seriously. My Android phone is Java. Every little thing probably uses 2 times more memory than GObject. Sure, would be nice if it was in assembly, but at the end of teh day, having the product sure beats not.
@Jerome: Right. Each GObject costs you exactly 2 pointers and one integer. That’s 20 bytes per instance on 64 bit. Every empty Java string is larger than this!
Well, and expect for the qdata pointer you really cannot get smaller if you want polymorph, reference counted objects.
struct _GObject
{
GTypeInstance g_type_instance;
/**/
volatile guint ref_count;
GData *qdata;
};
struct _GTypeInstance
{
/**/
GTypeClass *g_class;
};
On behalf of the GStreamer community : THANKYOU for caring about and improving glib’s threading/speed/memory !
Edward: Hey, we all write threaded apps by now – I don’t think there’s a lot of apps left that don’t call g_thread_init(). But I certainly think perf tests will get faster for you now. 🙂
Also, can we make that the default for gtk/glib 3? Lots of code behaves stupid without it, in particular code interacting with gio like the file chooser.
Pavel: I’d love to have a comparison of GObject with C++, Objective C(++) or Qt C++ just to see where we could do better. But the prolem is that a lot of people end up comparing apples to oranges because they don’t understand all the object systems and that doesn’t help at all. We don’t need a Phoronix for object systems.
jegHegy: In a typical GNOME desktop, you will not see a lot of benefits, because developers usually avoid GObject when they think it’s limiting them. This gave us developers travesties like GstMiniObject or CamelObject. Makeing GObject better is mostly about making life easier for developers that didn’t want to use it before,
The overhead for typechecks in a random program with debugging support is usually <5%, rarely does it go up to 10%. So if we make those operations twice as fast, all your stuff will run less than 3% faster, which is not something you'll usually notice.
Alex: \o/
@otte: you’re right it might only be 3% but that’s worth it I think. If only every release Gnome could get even half this speed-up, it would be awesome ;-P
Thanks Alex.
Thanks for the replies! Keep up the good work, everyone.
Cool. So what about getting rid of the global paramspec pool next 🙂
Woot! Thank you Alex! This is awesome! GObject creation performance was the one thing that really hurt GMime’s performance in the move from 2.2 to 2.4 (because I made everything into a GObject instead of a custom struct for a lot of things).
Forgot to mentioned the bugs:
* bug #536939 has a testapplication and some discussion
* bug #418970 has some more discussion with the outcome that the notify queue list should use g_slice
Thanks for picking up the patches. Sad that it had to take so much time for somebody to pick it up, the bug is almost a year old, and the same patch was proposed immediately after opening the bug. Nice that you further finetuned by taking a look at the interface-lookup problems too.
Nice, that was about 9 years overdue 😉
Might be fun if the performance suite had a simple gobject vs. hand-rolled struct comparison in it, just to quantify that (people will still think the hand-rolled struct is much faster until they see numbers I bet)
“Construction speed for simple objects more than doubled” is bad English.
Essentially if construction speed was 1/2 a second, this sentence implies that it’s now 1 second.
What you should have said is;
“Construction time for simple objects more than halved” – which makes more sense.
Karl:
Its not that wrong. The construction speed is not the construction time. The construction speed is the number of constructed object per second, i.e. what is printed out by the performance test app, and this doubled.