On Usability Testing

March 11, 2004

Usability testing (perhaps more aptly called “learnability testing”) is all the rage. If you look on the web for information on usability you’ll be bombarded with pages from everybody from Jakub Nielsen to Microsoft performing, advocating, and advising on usability testing. Its unsurprising that people who have been educated about HCI primarily by reading web sites (and books recommended on those web sites) equate usability with usability testing. Don’t get me wrong, usability testing is a useful tool. But in the context of software design its only one of many techniques, isn’t applicable in many situations, and even when it is applicable is often not the most effective.

  • Why is usability testing lauded all over the internet?
    The most visible and growing area of HCI is web site usability, because it has received broader corporate adoption than applying usability to other things (e.g. software). In other words: most usability discussed on the internet today is in the context of web page usability, and web page usability is profoundly improved by usability testing. Thus it is not surprising that much usability discussed on the internet today deals with usability testing.

    Desktop software usually presents a substantially different problem space from web pages. Compared to each other, desktop software represents more complex and varied operations where long term usability is crucial, whereas web sites represent a simple operation (very similar to 100 other websites users have used) where “walk up and use perfectly” is crucial. Design of infrequently used software, like tax software, is much more similar to web site design. One simple example… In most web pages, learnability is paramount: if on the first time visiting a web site users don’t get what they want almost instantly and without making mistakes they will just leave. Learnability is the single most important aspect of web page design, and usability tests (aka learnability tests) do a marvelous job at finding learning problems. In a file open dialog learnability is still important, but how convenient the dialog is to use after the 30th use is more important.

  • A good designer will get you much farther than a bad design that’s gone through lots of testing. (A good design that had testing applied to it is even better, but more comments on this later) Usability testing tends see the trees instead of the forest. You tend to figure out “that button’s label is confusing” not “movie and music players represent fundamenatlly different use cases”. Because of this usability testing tends to get stuck on local maxima rather than moving toward global optimization. You get all the rough edges sanded, but the product is still not very good at the high level. Microsoft is a poster child for this principle: they spend more money on usability than anyone else (by far), but they tend to spend it post-development (or at least late in devlopment). Its not an efficient use of resources, and even after many iterations (even over multiple versions) the software often still sucks. A good designer will also predict and address a strong majority of “that button’s label is confusing” type issues, so if you do perform usability testing you’ll be starting with 3 problems to find instead of 30. That’s especially important because a single usability test can only find several of the most serious issues: you can’t find the smaller issues until the biggies can be fixed. In summary: with a designer you’re a lot more likely to end up optimizing toward a global maximuum rather than a local maxima, AND if you do testing it will require far less usabilty testing to get the little kinks out.

  • Usability testing is not the best technique for figuring out the big picture. Sometimes you will get an “aha” experience triggered by watching people using your software in a usability test, but typically you can get the same experience by watching people using your competitor’s software too. Also a lot of these broad observations are contextual, they require an understanding of goals and how products fit into people’s lives that is absent in typical usability tests. Ethnographic research is typically a much more rewarding technique for gaining this sort of insight.

  • Producing a good design requires more art than method. I think a lot of people are more comfortable with usability testing because it seems like a science. Its methodical, it produces numbers, its verifiable, etc. Many designers advocate usability testing less because it improves the design, and more because its a useful tool for convincing reluctant engineers that they need to listen: usability testing sounds all scientific. Usability testing can be a very useful technique for trying to get improvements implemented in a “design hostile environment”. This is part of why I pushed/did more usability testing early on in GNOME usability. Companies would love it if there were a magic series of steps you could follow to produce genuine guarunteed ultra usable software. Alas, just like programming, there isn’t. A creative insightful informed human designing the software will do much better than any method.

  • Usability tests can’t, in general, be used to find out “which interface is better”. I mention this because people periodically propose a usabilty test to resolve a dispute over which way to do things is right. Firstly, you’ll only be comparing the learnability. There are many other important factors that will be totally ignored by this. Secondly, usability tests usually don’t contain a sufficiently large sample of users to allow rigorous comparison. Sure, if on interface A 10 people used it without trouble, and on interface B 10 people used it and 40 serious problems were reported you can confidently say that interface A was way more learnable (and at these sort of extremes you can probably even assert its much better overall) than interface B. But its rarely like that.

    Example: We test interface A on 10 people and we find one problem that effects 8 of the people, but only causes serious problems for 2 people, and 3 serious problems that effect one person each. We test interface B on 10 people and we find one serious problem that effects 3 people, another serious problem that effects 2 people, and 3 serious problems that effect one person each. Which interface is better? Its a little harder to tell. So lets say we argue it out and agree that interface A is better on usability tests. But we’ve only agreed that interface A is more learnable! Lets say our designer asserts interface B promotes a more useful conceptual model, and that conceptual model is more important than learnability here. How do we weight this evidence against the usability test? We’re a little better off than we were before the test, but not a lot, because we still have to weigh the majority of evidence that’s not directly comparable. If we always accept “hard data!” as being the final authority (which people often, somewhat erroneously, do in cases of uncertainty), even when the data only covers a subset of the problem consideration then we are worse off than before the test.

So am I saying that usability testing is bad or doesn’t improve software? No! If you take a good design, usability test it to learn about major problems, and use that data and experience to improve your design (remembering that design is still about compromise, and sometimes you compromise learnability for other things)… you will end up with a better design. Every design, even very good ones, that a designer pulls out of their head has some mistakes and problems. Usability testing will find many of them.

So why don’t I advocate usability testing everything? If you don’t have oodles of usability people, up front design by a good designer provides a lot more bang for buck than using that same designer to do usability tests. You get diminishing returns (in terms of average seriousness of problems discovered) as you do more and more fine grained tests. Its all about tradeoffs: Given n people hours across q interface elements (assuming all people involved were equally skilled at testing and design, which is obviously untrue) what is the optimuum ratio of hours spent on design vs. hours spent on testing? For small numbers of people hours across large numbers interface elements, I believe in shotgun testing, and spending the rest of the time on design. Shotgun testing is testing the interface in huge chunks, typically by taking several large high-level tasks that span many interface aspects and observe people trying to perform them.

An example high-level task might be to give somebody a fresh desktop and say: “Here’s a digital camera, an e-mail address, and a computer. Take a picture with this camera and e-mail it to to this address”. You aim at a huge swath of the desktop and *BLAM* you find its top 10 usability problems.

Anyway, like practically everything I write this is already too long, but I have a million more things to say. Oh well 😉