On Usability Testing

March 11, 2004

Usability testing (perhaps more aptly called “learnability testing”) is all the rage. If you look on the web for information on usability you’ll be bombarded with pages from everybody from Jakub Nielsen to Microsoft performing, advocating, and advising on usability testing. Its unsurprising that people who have been educated about HCI primarily by reading web sites (and books recommended on those web sites) equate usability with usability testing. Don’t get me wrong, usability testing is a useful tool. But in the context of software design its only one of many techniques, isn’t applicable in many situations, and even when it is applicable is often not the most effective.

  • Why is usability testing lauded all over the internet?
    The most visible and growing area of HCI is web site usability, because it has received broader corporate adoption than applying usability to other things (e.g. software). In other words: most usability discussed on the internet today is in the context of web page usability, and web page usability is profoundly improved by usability testing. Thus it is not surprising that much usability discussed on the internet today deals with usability testing.

    Desktop software usually presents a substantially different problem space from web pages. Compared to each other, desktop software represents more complex and varied operations where long term usability is crucial, whereas web sites represent a simple operation (very similar to 100 other websites users have used) where “walk up and use perfectly” is crucial. Design of infrequently used software, like tax software, is much more similar to web site design. One simple example… In most web pages, learnability is paramount: if on the first time visiting a web site users don’t get what they want almost instantly and without making mistakes they will just leave. Learnability is the single most important aspect of web page design, and usability tests (aka learnability tests) do a marvelous job at finding learning problems. In a file open dialog learnability is still important, but how convenient the dialog is to use after the 30th use is more important.

  • A good designer will get you much farther than a bad design that’s gone through lots of testing. (A good design that had testing applied to it is even better, but more comments on this later) Usability testing tends see the trees instead of the forest. You tend to figure out “that button’s label is confusing” not “movie and music players represent fundamenatlly different use cases”. Because of this usability testing tends to get stuck on local maxima rather than moving toward global optimization. You get all the rough edges sanded, but the product is still not very good at the high level. Microsoft is a poster child for this principle: they spend more money on usability than anyone else (by far), but they tend to spend it post-development (or at least late in devlopment). Its not an efficient use of resources, and even after many iterations (even over multiple versions) the software often still sucks. A good designer will also predict and address a strong majority of “that button’s label is confusing” type issues, so if you do perform usability testing you’ll be starting with 3 problems to find instead of 30. That’s especially important because a single usability test can only find several of the most serious issues: you can’t find the smaller issues until the biggies can be fixed. In summary: with a designer you’re a lot more likely to end up optimizing toward a global maximuum rather than a local maxima, AND if you do testing it will require far less usabilty testing to get the little kinks out.

  • Usability testing is not the best technique for figuring out the big picture. Sometimes you will get an “aha” experience triggered by watching people using your software in a usability test, but typically you can get the same experience by watching people using your competitor’s software too. Also a lot of these broad observations are contextual, they require an understanding of goals and how products fit into people’s lives that is absent in typical usability tests. Ethnographic research is typically a much more rewarding technique for gaining this sort of insight.

  • Producing a good design requires more art than method. I think a lot of people are more comfortable with usability testing because it seems like a science. Its methodical, it produces numbers, its verifiable, etc. Many designers advocate usability testing less because it improves the design, and more because its a useful tool for convincing reluctant engineers that they need to listen: usability testing sounds all scientific. Usability testing can be a very useful technique for trying to get improvements implemented in a “design hostile environment”. This is part of why I pushed/did more usability testing early on in GNOME usability. Companies would love it if there were a magic series of steps you could follow to produce genuine guarunteed ultra usable software. Alas, just like programming, there isn’t. A creative insightful informed human designing the software will do much better than any method.

  • Usability tests can’t, in general, be used to find out “which interface is better”. I mention this because people periodically propose a usabilty test to resolve a dispute over which way to do things is right. Firstly, you’ll only be comparing the learnability. There are many other important factors that will be totally ignored by this. Secondly, usability tests usually don’t contain a sufficiently large sample of users to allow rigorous comparison. Sure, if on interface A 10 people used it without trouble, and on interface B 10 people used it and 40 serious problems were reported you can confidently say that interface A was way more learnable (and at these sort of extremes you can probably even assert its much better overall) than interface B. But its rarely like that.

    Example: We test interface A on 10 people and we find one problem that effects 8 of the people, but only causes serious problems for 2 people, and 3 serious problems that effect one person each. We test interface B on 10 people and we find one serious problem that effects 3 people, another serious problem that effects 2 people, and 3 serious problems that effect one person each. Which interface is better? Its a little harder to tell. So lets say we argue it out and agree that interface A is better on usability tests. But we’ve only agreed that interface A is more learnable! Lets say our designer asserts interface B promotes a more useful conceptual model, and that conceptual model is more important than learnability here. How do we weight this evidence against the usability test? We’re a little better off than we were before the test, but not a lot, because we still have to weigh the majority of evidence that’s not directly comparable. If we always accept “hard data!” as being the final authority (which people often, somewhat erroneously, do in cases of uncertainty), even when the data only covers a subset of the problem consideration then we are worse off than before the test.

So am I saying that usability testing is bad or doesn’t improve software? No! If you take a good design, usability test it to learn about major problems, and use that data and experience to improve your design (remembering that design is still about compromise, and sometimes you compromise learnability for other things)… you will end up with a better design. Every design, even very good ones, that a designer pulls out of their head has some mistakes and problems. Usability testing will find many of them.

So why don’t I advocate usability testing everything? If you don’t have oodles of usability people, up front design by a good designer provides a lot more bang for buck than using that same designer to do usability tests. You get diminishing returns (in terms of average seriousness of problems discovered) as you do more and more fine grained tests. Its all about tradeoffs: Given n people hours across q interface elements (assuming all people involved were equally skilled at testing and design, which is obviously untrue) what is the optimuum ratio of hours spent on design vs. hours spent on testing? For small numbers of people hours across large numbers interface elements, I believe in shotgun testing, and spending the rest of the time on design. Shotgun testing is testing the interface in huge chunks, typically by taking several large high-level tasks that span many interface aspects and observe people trying to perform them.

An example high-level task might be to give somebody a fresh desktop and say: “Here’s a digital camera, an e-mail address, and a computer. Take a picture with this camera and e-mail it to to this address”. You aim at a huge swath of the desktop and *BLAM* you find its top 10 usability problems.

Anyway, like practically everything I write this is already too long, but I have a million more things to say. Oh well 😉

A Listy Dilemna

February 17, 2004

GNOME’s desktop-devel-list today is just what gnome-hackers list used to be. Its not like this is a new problem. Lists start out good, but then too many people get on them, so we eventually restrict who can be on the list, and then some people think we are too elitist and start a new list. Which is non-elitist and has a high signal to noise ratio… until the effects of non-elitism creep in, and we have these problems all over again.

  1. Having a central desktop list seems like a thing that happens naturally, and is also the list I’m most likely to read. Thus I personally at least consider it good.
  2. Restricting access removes much of the cluelessness, but at the cost of greater administrative burden, and locking out valuable potential contributors
  3. Restricting access does not typically make lists regain their old “high signal to noise ratio” status. For example, gnome-hackers was periodically prone to extended technical discussions (by clueful people) that became tiresome for most people on the list and ideally would have jumped list. They were often good discussions to have, but not everybody needed to be party to them.
  4. Fragmented lists tend to be ignored, even by the people they are most relevant to (such as the relevant maintainers, often)

In short, it is best to have fewer lists, but we need to alleviate the problems that make a few central lists occasionally painful.

It seems that the real problem is not the variety of the threads, but that some threads don’t die which we’d really rather not have on the list-that-everyone-reads (or at least, used to read ;-). Flags and the recent release name discussions come to mind. What if there was a way to create quick temporary break out discussion lists? Something that required no admin maintenance. That way rather than fragmenting general discussion, we could create immediate outlets for in-depth (and sometimes important, othertimes not) discussions that most people don’t want to read (or in my case Mark As Read).

Rather than fragmenting lists by “general topic”, which seems not to work, why don’t we fragment list traffic on a per discussion basis. Very few discussions will need this, but the few that do we can not destroy the public list’s readability for the week+ it takes to run its course.

Say we have 10 or so responsible people who can create a breakout discussion “list”. There’s a little web form one of these people can use to break-out a discussion. The person gives the subject of the discussion into the web form, and an “End of Discussion” ultimatuum/message gets automatically posted to ddl. In this ultimatuum is a link. When the message is received, clicking on the link pops up a form where you can enter your e-mail address, and *poof* you’re in on the discussion in the breakout list. Every post to the breakout list has a link appended for leaving the list. Basically people interested in the conversation can keep at it, but the conversation moves off-list in a convenient manner. Some considerations for the breakout lists:

  • We don’t try to do any security or passwords or confirmation e-mails for adding/removing people from lists, because these things are supposed to be cheap, dirty, and ephemeral. They need to have a ridiculously low barrier of entry.
  • We don’t want too many people who can create breakout lists, or any discussion that generates a dozen messages will get somebody trying to break out the discussion. When breakouts happen too often, people will start ignoring the breakout ultimatum and will keep posting on d-d-l, destroying the efficacy of the technique when it is really needed. On the other hand, we need enough people who can create lists that at least a few of them are active on the list every day. That way it doesn’t become some onerous task for which an “admin” has to be tracked down, and coaxed to waste their precious time on (for example, its not like trying to get somebody to do CVS surgery for you).

It is interesting to compare the “list problem” to how discussions work in the “real world”. In the real world we would have serious trouble if everybody had to listen to every discussion involving more then four participants. The fragmented lists suggestion is somewhat akin to having 25 separate rooms, each devoted to a particular topic. This is a sort of weird division, and people are probably going to drift into larger rooms (or have off topic conversations). Naturally people control conversation and topic interest pretty well by drifting in and out of groups. Basically, breakout discussion lists is a way to try and accomodate that sort of ephemeral shift.

January 22, 1984: the Apple Macintosh is unleashed on the world. The world blinks and keeps on turning.

The release of the Macintosh wasn’t the revolution, it was a symbol of the revolution. It wasn’t merely the introduction of an “insanely great” product line but of the debutante ball of the process that birthed it. And at the heart of that process (human-centered design) was a paradigm shift. The question was no longer “What will this computer’s specs be?” but “What will people do with this product?“. That question is as relevant (and almost as frequently overlooked) today as it was twenty years ago. The importance of the revolution was less in Windows Icons Menus and Pointer and more in approaching product development from the right direction. Until widespread development and design in the computer industry is focused on a question like that, the Macintosh revolution is far from over.


The Star desktop, circa 1981

There is widespread disagreement as to when and where this revolution began, but it is not contentious that the ideas took root in the feracious ground of Xerox PARC in the 70s. The end result was the Xerox 8010 (aka Star) desktop, released in 1981. To a large extent the Star interface is extant in modern desktops, but this belies the importance of the Star: it was the result of human-centered design. Engineers and researchers at Xerox tried to create a computer that could be used to “do people things” rather than just crunch numbers. Focus was not on specs and technology but on what Star could accomplish.


The Alto’s “Executive”, circa mid 1970s

It is interesting to compare the Star interface with the interface of the Executive program from the equally famous Xerox Alto (from the mid 70s). The Alto was a technical marvel, with a bitmapped display, windows, a mouse, and ethernet. But while the Star really adds nothing to this impressive list of technology, the difference between the two, in terms of user experience, is like night and day. Technological invention can enable real improvement, but its not enough (usually its not even necessary). Anyway, enough historical meandering. The story of the Macintosh, Star and Alto is very interesting, and there’s a lot of period documents dealing with that subject… maybe I’ll post a list of links another day. But back to my agenda: 🙂

At best I think most people ask “What could people do with this computer”. That’s a very different question from “What people will people do with this computer”… there are so many nifty features that if people pushed themselves they could use, but have a high enough barrier to entry that people don’t bother.

Example: I have a nice thermostat in my apartment. Its fairly well designed and has quick push buttons for “Daytime”, “Night” and “Vacation”. It was even straightforward to set these to my preferred temperatures for “In the apartment, awake”, “Out of the apartment or asleep”, and I haven’t bothered with the vacation button. Now I have noticed that I don’t like to get out of bed in the morning because it is sort of cold. In fact, sometimes I’ll lie in bed for 30+ minutes because its cold, which is a big waste of time (I’m not very rational when I’m waking up). I have noticed that my thermostat supports scheduling changes between day and night temperature. I even looked at the instructions beneath the faceplate, and it looks like it’d be fairly easy to program. But I haven’t done it. The device is usable in the sense that if I wanted to, I could program it, and probably get it right on the first or second try. Its not hard to use. But its a little too inconvenient, because I’d have to special case my weekend schedule, I’d have to set several different times using the fairly slow “up”, “down”, “next item” interface for setting time (on most alarm clocks etc). The point is, its not hard to figure out, but its stills too much hassle. So while I could program the thermostat, I won’t. There’s always something that seems better to do with my time, and I can’t be bothered (even though rationally I know it’d be better overall if I just program the silly thing).

The Macintosh revolution, at least how I see it, was about conceiving your (computer related) product in terms of what people will do with it. Sometimes we need to “get back to the basics”…

Can you face the sign of…

January 28, 2004

 


The GEGL!!!

 

More lessons in frigidity

January 27, 2004

Lesson: Do not leave soda cans in the car. It might seem obvious to those of you accustomed to cold climates, but I finally realized I have to think of my car as a freezer. So I left 12 cans of mountain dew in my car right behind the driver’s seat. What happened? Of course about half the cans exploded when they froze. So that’s not good… unfortunately I didn’t notice this for three weeks. Well, turns out one of these days must have gotten above freezing, because the mountain dew in the exploded cans melted. Of course by now its all frozen into my carpet. I hate this place.

Lesson: Microfiber pants really help. Thanks to a suggestion from Carl-Christian Salvesen in Norway I’ve swapped out jeans for microfiber slacks if I’m walking around in the cold. They’re much lighter so I sort of assumed they wouldn’t work as well as jeans. Not so, they appear to trap a lot more heat. When its really windy I have windbreaker-pant things I can pull over them.

Lesson: If you want to buy gloves and scarves you have to do it before winter. People in this place exhibit extraordinary wishful thinking during the winter, it seems. The stores have already had their “get rid of everything” winter clearance sales. Sears had not a single scarf or pair of gloves left. The stores are filled with people bundled up to the rafters in coats, gloves, scarves, etc buying… swim suits and light skirts. Its ridiculous. I mean, I know you start selling before the season starts, but its not even February yet! What are you supposed to do if you lose your gloves? (anyway, I finally found some nice black leather gloves, but I tried a bunch of stores that used to have them but don’nomo’)

Bad day for banking

January 21, 2004

Today is a bad day for banking.

So this morning (not 20 minutes ago) I pulled up to a bank’s drive-up ATM with the intent of withdrawing $40. I ended up with $400. I also managed to lose my bank card.

Despite the fact that most ATMs only handle money in multiples of $20, they still require you to enter the “cents”. So asking for $40 entails the button sequence 4-0-0-0. I almost always withdraw $40, so I perform this series of presses without a lot of higher brain involvement. Unfortunately, this ATM fixed a “bug” in the way 95% ATMs work: they only let you enter whole dollar values.

So I pressed 4-0-0…. and caught myself before pressing the final 0 based on the feedback on screen (actually, if I’d pressed the final 0 things would have turned out better because ATMs won’t give you $4000 in a single transaction).

Most ATMs have on screen commands and buttons along the sides of the screen that are supposed to line up with the commands (press the button and it executes the command “next” to it). The problem is they often have the buttons far enough away from the edge of the screen, and the buttons are raised. The net effect is that at different heights the buttons line up differently. Additionally, even when there are only two options, they tend to put them on buttons that are right next to each other. Even given the flawed physical design, the chance of error could be dramatically reduced if the options were always kept as far from each other as possible.

Well, I was flustered, because its disturbing to know that one button press will dump $400 in cash on you: I wanted that $400 off the screen pronto! In my haste I did not account for the button line up (my car is really small and hence low… buttons must have been designed for an SUV), and pressed the accept option instead of cancel. 30 seconds later I’m flush with benjamins.

So my first reaction is “put this cash somewhere safe”, so I find a place to stow it temporarily. Then I glance over and see the receipt and grab it, because I sure want a record of this transaction until I count the loot. Then I realize that because I’m getting an apt soon, I really want the money in the bank ASAP, so I back up and grab a deposit envelop.What did I forget? Oh yes, I forgot to take my card.

So I pull away from the ATM into the bank’s parking lot to fill out the deposit envelop, stuff the cash into it, and head back to the machine. I fumble around my wallet for my card. Can’t find it. Then I realize that I might have put the card loose on the seat next to me (which I sometimes do if I’ve already stowed my wallet in my pocket). So I drive back to the parking lot and dismantle my car looking for the card. Then it hits me 🙁

So I head into the bank, and the nice man at the desk gets the manager and they go to check the ATM. Oh sorry, your card was with another bank, so we can’t give it back to you. Unfortunately my credit union has no branches within thousand miles of here, so its going to take time to get a new card.

$(#*&(&*(*&!!! On the upside, I got $400 out of the account before losing my card, and I guess I can still write checks from that account to get an apt, so its not the end of the world.

I can’t believe I did this because I’ve always been grumpy and conscious of the button-line-up usability problem present in many ATMs. Good ATMs have the buttons close to the screen and at the same height as the screen so it all lines up no matter what angle you look at it from…or they’re touch screens (which has other downsides at times, but overall I think is an improvement).

Cold in Cambridge

January 16, 2004

Well…. I have learned some valuable lessons. This is my first experience with True Cold[TM]. So last night I decided to walk 2 miles from my friend’s place back to the bed & breakfast… at 1 am. At -30F (with windchill). It was… very cold. Lesson one was that jeans cannot, in fact, be worn in any weather. By the time I got home my legs were very, very cold. Lesson two is that a scarf is a worthwhile thing to have (at least, I’m guessing it would be) because by the time I got home sans scarf I could no longer feel my nose. Lesson three is that you look funny the next day after exposing yourself to cold. My skin is all red and is flaking.

On the upside, my jacket held up well, and I learned that socks can actually work better than gloves because they keep all your fingers together (thanks Josh!).

Settling down

January 13, 2004

Still trying to develop a rhythm for what I’ll be doing. Its weird, but where I know how to change whatever in GNOME (who to talk to, who to avoid, etc)…. when it comes to messing with things outside in GNOME in Red Hat I really have no clue where to go. So slowly figuring that sort of stuff out.

In other news, been looking for an apartment. I made the mistake of agreeing to stay a month at the Bed and Breakfast. The downside is that they really don’t have enough parking and I’m double parked in their driveway, which means I have to get up early to make sure that my car doesn’t hem somebody else’s in. Oh well.

Currently I’m planning to live somewhere “close to Red Hat”. Have looked at a number of apartments… currently leaning toward living in Nashua, NH though that is notably farther from Boston than, say, Kennsington apartments (which is absolutely stellar… except its $300/mo more than I want to pay).

The Wandering Nomad

January 5, 2004

Yes, my weblog has grown silent. Yes, important messages clamour for my attention amidst the congestion that is my inbox. What has been going on you ask?

Last week I spent on the road driving ~12 hours a day. My brother foolishly agreed to go along and had to put up with a week of me whinging about his speeding and swerving (sorry about that Drew!). I think I would have gone nuts without the company. Along the way I stayed with a good HS friend (Kenny Martens) who I hadn’t seen in 4 years, a close stanford friend who I’d never gotten a chance to say goodbye to (Jamie Fitz), and another good stanford friend who graduated early and went east (Brian Shieh).

We spent two nights sleeping in rest areas… in AZ the temperature dropped to -15C. Drew was wise enough to sleep in the car…. I on the other hand was huddled on the cement next to a picnic table in my sleeping bag (can’t stand sleeping in cars). Arrived at my destination on Friday. In a fit of compunctions I flew my brother home instead of selling him to a passing ship as I had originally intended.

Where was I driving to you ask? San Fransisco to Boston by way of Dallas. “Boston? Boston?!?” you exclaim, “Shan’t you perish in a blizzard of ice and bad driving?” Yes! But sacrifices must be made. “But why Boston???”

Funny you should ask. That brings me to the next tidbit of news. As of today I’m now working at Red Hat in Westford, MA (“Boston”, MA for some definitions of Boston) as an interaction designer.

The original plan was to arrive in Boston on the 1st, stay with my friend Maisy before she disappeared back to stanford, and find a place to live by the 5th when RH threw me on a plane to Raleigh for employee ortientation (which is overall incovenient and disorienting, but c’est la vie). This plan failed. Its been considerably harder to find a place to live in Boston area than it was in either minnesota or bay area. Most people apparently use a realtor who charges first month’s rent in fees (!!!). I’m way to Scotch to throw my money at somebody who I consider to be the ultimate middeman. Anyway, current plan is to stay at a bed and breakfast when I get back from Raleigh while I figure out where I want to live. It’ll also be good to get a taste of the ~40 minute commute from Cambridge area to Westford before I sign a 12 mo. lease locking me into either the boonies or into a ridiculously long commute. 😉

Design Principles

December 29, 2003

Best design principle I’ve heard in a while: “What can we fit on the form?”