I’ve been meaning to write this for a couple of months, but I never found the time to do so. But as I promised Federico, here we go..
In this blog post I will share some of the experienced gained in unit testing a large Gtk application written in Python. Some of it only applies to Python, but most of the concepts are possible to implement in other languages.
First a bit of background. As most software projects Stoq started out with no unittests at all. That changed some 5-6 years ago when unit tests for the core business logic were added. Almost all application state of Stoq is stored in PostgreSQL, accessed via the psycopg2 database adapter and Storm, an excellent ORM written by Gustavo Niemeyer.
The current (late march 2013) code coverage of the domain classes is above 90%,
we’ve been trying to increase this gradually, but it’s a lot of work to gap the
remaining part and have never really been the focus.
The business logic, which we refer to as the domain classes, are conceptually
easy to test and has been documented elsewhere so I will not go in great detail in here.
While the domain classes are arguable the most important parts to test in the Stoq case, for instance doing payment related calculations wrong would be disastrous, we’ve always had problems of errors creeping up in level above the UI layer on top of it.
Sometimes around August last year I decided to investigate if anything could be done to
easily increase the coverage of code in the UI layer, which involves Gtk+/PyGTK.
My first approach was something like:
dialog = PaymentEditor() assertEqual(dialog.value_widget.get_text(), "0") assertEqual(dialog.description_widget.get_text(), "")
PaymentEditor() constructs the widget tree, either manually or via a glade file,
creates an empty database domain object (Payment) and attaches it to the dialog.
Now, let’s test with real values, create a Payment and attach it to the form:
payment = Payment(value=10.0, description="New Payment")
Create another dialog showing the domain object:
dialog = PaymentEditor(payment)
And verify that the dialog has the right values set:
assertEqual(dialog.value_widget.get_text(), "10.0") assertEqual(dialog.description_widget.get_text(), "New Payment")
Okay, so far so good.
This has increased the code coverage and we test opening an empty dialog,
which is used when creating a new payment and we’re testing.
There are a couple of problems with this approach though:
- While working fine for 1 dialog and 2 fields, it’s not really scalable to huge assistents (~8 pages, 50+ widgets)
- We’re only testing a limited subset, we want to test everything that’s visible. Labels, entries, packing, selection, columns and kiwi extensions such as validation, mandatory state, input masks.
So instead of checking each widget individually, let’s serialize the whole widget tree,
into what we call a UI-test, it looks like this:
GtkDialog(main_dialog.toplevel): title='Edit Details of "New Payment"', hidden GtkVBox(main_dialog._main_vbox): GtkVBox(main_dialog.vbox, expand=True, fill=True): GtkEventBox(main_dialog.main, expand=True, fill=True): GtkAlignment(alignment): GtkTable(): ProxyLabel(description_lbl): 'Description:' ProxyEntry(description): 'New payment' ProxyLabel(value_lbl): 'Value:' ProxyEntry(value): '10.00', insensitive
Okay, the UI-test format has a couple of improvements:
- basic widget hierarchy is tested (g_type_name, g_type_parent)
- content of GtkEntry/GtkLabel are included
- widget sensitivity (via insensitive) and visibility (via hidden)
- packing options (expand, fill)
We also do some Python magic to include the variable names of the widgets, so it’s easier to read and understand which widget is is which, this is the value in parenthesis in the name. The dot notation means that the widget reference is stored in a sub instance of the test dialog.
This is similar to HTML page testing, where you save the rendered content of your web application to disk. We plug this into our testing infrastructure so that we have a call like:
self.check_dialog(dialog, "test.uitest")
Which serializes the widget tree to a string and compares it to the previous run, which is stored in the source code repository. If any changes are made to the widget tree, we show a diff against current and last known state, eg:
====================================================================== FAIL: stoqlib.gui.test.test_missingitemsdialog:TestMissingItemsDialog.test_confirm ---------------------------------------------------------------------- Traceback (most recent call last): [snip] failed: test.uitest --- test.uitest +++ test.uitest @@ -1,9 +1,9 @@ test.uitest -dialog: ConfirmSaleMissingDialog test.uitest +dialog: MissingItemsDialog test.uitest GtkDialog(toplevel): title='Missing items', hidden test.uitest GtkVBox(_main_vbox): test.uitest GtkVBox(vbox, expand=True, fill=True): test.uitest GtkEventBox(header): test.uitest - GtkLabel(): '<b>The following items don't have enough stock to confirm the sale</b>' test.uitest + GtkLabel(): '<b>The following items don't have enough stock to confirm.</b>' test.uitest GtkEventBox(main, expand=True, fill=True): test.uitest ObjectList(_klist): test.uitest column: title='Product', expand
In the test above which is a real error that happened to yesterday, a GtkLabel() changed and someone forgot to update the UI-test, which is a simple rm + rerunning the test.
A real UI-test for our PaymentEditor can be found here.
In the format above, we’re also including the complete domain objects, which may contain attributes that are not shown in the interface, but nevertheless important to test.
The implementation for this, which is pretty generic and couple be used for any Gtk application currently lives inside Stoq and can be found here:
https://github.com/stoq/stoq/blob/master/stoqlib/gui/uitestutils.py
At some point I need to sit down and move it out of Stoq and put in an external library that can easily be used by other Python applications, as it currently a bit tied to kiwi/Stoq.
And the real PaymentEditor test can be found here, which shows how to use the internal API can be found here.
One important aspect of successful UI-testing involves mocking, as we need to be able to fake state to be able to test all code. But that’s for a separate posting as this is already getting a bit long.
So in summary, our total coverage before starting to do UI testing was around 35%, essentially no interfaces were tested. 8 month after first being introduced we have written 419 different UI-tests and the coverage of our currently at 78%. Remember this is a project that has 80k lines+ of Python code.
The main effect of this is that we’ve reduced the amount of QA needed before doing new releases and we have a lot more confidence that things keep working when doing large refactorization.