Usability testing of mobile software

Jubilee-clip

Thanks to Antoine RJ Wright for choosing this as the post of the week in the 260th Carnival of the Mobilists. Head there for some of the best in mobile writing on the web.


The conference season is upon us

We are going to South by South West and UX Sofia this year, and we have started to prepare our sessions:

  • We'll be doing (again) the old DIY Mobile Usability Testing in Austin. We promise it will be the last time, although... didn't we promise that before?
  • We'll be turning the session into a workshop for UX Sofia (very exciting and very challenging)
  • And there will be a new presentation in Sofia aswell, about design and mobile fragmentation

I will be posting some of the materials that come out of the preparation process. Here comes the first one: some thoughts on applying usability testing to mobile software.

Usability testing of mobile software

Applying usability testing to the study of mobile applications and websites brings considerable challenges. Which phone should we use for testing? Can we use an emulator? How do we prototype for mobile? Can we just recycle the tasks we use for testing desktop software? Do we test in the lab or in the field? How do we record the mobile phone screen, user’s input and facial expressions?

In spite of all these tribulations, test we must. When we are designing desktop software, we can afford to skip usability testing sometimes. We have done it before, we have a good idea of what works and what doesn’t, and we have a set of well established design patterns that have been proven to work across all platforms to assist our design effort.

Mobile could not be more different. Mobile is new, mobile changes fast, mobile is very fragmented, mobile involves a complex and still badly understood context of use, and mobile does not benefit from well established design patterns that work across all platforms and form factors.

Testing with users and the resulting design iteration is the only way to develop useful, usable and beautiful mobile software.

What is usability testing

A process that employs people as testing participants who are representative of the target audience to evaluate the degree to which a product meets specific usability criteria

This is the definition offered by J. Rubin and D. Chisnell. We all now pretty much how it goes: you get a few people who fit the profile of your target audience, you ask them to do some tasks with your software - or with a simulation of that software i.e. a prototype, and then you watch as they attempt to complete the tasks.

We record usability testing sessions. Recordings become a very useful memory aid that complements our notes when required. However, the most important reason why we record tests is related to video as a communication tool. Even the teams most reticent to believe that anybody could have difficulties using their beloved software will cave when watching real people having real trouble. Video constitutes unequivocal evidence of the existence of usability issues.

Useful as this is, I believe the real power of video comes from its ability to generate empathy. For design and development teams, the end user is an abstract entity. By watching usability testing videos, that abstract entity gains a face: it becomes real people, trying to do real things with software, and having real problems when doing so.

The team making the software will remember these people for months. They become a point of reference when making design and development decisions. Team members start saying things like: “Do you see that button, remember that girl that couldn’t find the button, do you think she would find it now? Is it big enough for her? Would it look clickable to her? Is it in the place where she would expect it to be?”

By this “concretisation”, users are brought into the centre of the software-making process. This is something we normally attempt to achieve using personas, although I must confess that where my personas have failed miserably, I have succeeded using videos from usability testing sessions.

What do we record?

When recording usability testing sessions, we aim to capture actions and reactions:

  • The user’s actions, mainly in the form of mouse clicks.
  • The software or prototype reactions, which happen on the screen
  • The user’s reactions. We humans react with our face, and that’s why we record facial expressions

Desktop usability testing vs. mobile usability testing

At first sight, doing usability testing of mobile software is pretty much the same as doing usability testing of desktop software. Only that it really isn’t. As always when you throw mobile into the equation, you are adding a few extra challenges.

Simplifying the whole thing a little bit, those challenges can be reduced to 3 big questions you must answer while planning usability testing with mobile devices:

  1. Which phone will you use?
  2. Which context will you choose?
  3. And which connection?

It’s important to understand that there are not right or wrong answers to these 3 questions. Your response will depend on the nature of the software you are testing, and on what you are trying to achieve with your test.

1. Which phone will you use?

On his Alertbox column of 20 July 2009 Jakob Nielsen shares the results of a usability testing experiment they ran with 48 people who were asked to complete some tasks using several websites on their mobile phones. Nielsen determined the average success rate at 59%. When they broke down that average figure by the type of phone participants used during the test, they found the following:

  • Participants using feature phones (something like a Nokia 6300, with a smallish colour screen and a numeric keypad where you cannot install native applications) were able to complete 38% of the tasks.
  • Participants using smartphones (something like an oldish Blackberry Bold, with a bigger colour screen and a qwerty keyboard where you can install native applications) were able to complete 55% of the tasks.
  • Participants using what Nielsen calls “touch-screen” phones (something like an iPhone, with an even bigger screen than a smartphone incorporating touch technology) were able to complete 75% of the tasks.

The moral to this story is that handset usability affects test results. A wonderfully designed website will feel difficult and cumbersome when used with a phone plagued by usability issues. Not that feature phones are badly designed (some are, some aren’t), but they are probably not optimised for web browsing or application usage. Similarly, not all touch-screen phones are built equal, and some of them will perform better than others.

In any case, we must find a way to minimise the effect of handset (and browser) usability on test results, and here is how:

  • Whenever possible, test with participants’ own phones. They might be terrible as handsets go, but participants are probably accustomed to their flaws and have developed tricks and workarounds.
  • If this is not possible, and sometimes it won’t be, schedule training time on your test plan to explain participants how to use the phone, and include some warm-up tasks participants can attempt to become familiar with the handset.

2. Which context will you choose?

The question of where to run the tests, in a usability laboratory or in the field, is important in the mobile world. The busy, noisy, distracting mobile context of use couldn’t be more different from the calm, quiet and focused laboratory environment.

Comparative studies about the importance of field testing in mobile software have reached contradictory conclusions. A. Kaikkonen et al. (pdf) tell us:

There was no difference in the number of problems that occurred in the two test settings. Our hypothesis that more problems would be found in the field was not supported.

C.M. Nielsen et al., however, found that

evaluations conducted in field settings can reveal problems not otherwise identified in laboratory evaluations

What both papers agree on is the fact that testing in the field is a messy affair. A. Kaikkonen et al. observe it takes

double the time in comparison to the laboratory

C.M. Nielsen et al. describe it as complex and time-consuming.

The truth is that most industry projects lack the time, budget, personnel and expertise to run tests in the field. For most of us, field testing is not even an option, and we must find consolation in the fact that testing in the lab is better than no testing at all.

If field testing must be done (and for certain types of software, it must: think of data collection applications, geo-location related software or mobile payments), it should be done late in the design cycle as a validation mechanism. It should be carefully planned and rehearsed with pilot runs, and the team should be ready to handle unexpected problems on test day.

3. And which connection?

With extended 3GSM availability in Europe (Ofcom’s 2011 Communications Market Report - pdf - indicates that 95% of the United Kingdom population now lives within 3G coverage), and some operators already testing and rolling out 4th generation networks, it is easy to forget that fast data wireless connections are not accessible to significant portions of the rural population. Only 54% of the people living in Northern Ireland enjoy 3G coverage, according to the same Ofcom report. The conclusion is that we still must test our mobile software over 2G connection speeds as part of our quality assurance process.

When it comes to usability testing, we must remember that the bulk of our software usage will take place over the mobile phone network. In most cases, using wi-fi during usability tests will give participants an unrealistic picture of speed, performance and software responsiveness. Remember: test over the available mobile phone network (unless of course your software is like the BBC iPlayer and only works over wi-fi).

And don’t forget the small detail of covering the data costs your participants might incur as a result of the test.

The final challenge: recording your tests

We have found 4 main approaches to recording usability tests involving mobile devices:

  1. Wearable equipment
  2. Screen capture applications
  3. Document cameras
  4. Mounted cameras

1. Wearable equipment

Wearable equipment involves

  • cameras recording the mobile phone screen and participant's face
  • microphones and headsets to provide instructions or recording comments
  • and, unfortunately, belts and backpacks to carry batteries, since the equipment has to be powered somehow.

Wearable equipment allows you to test in the field, but:

  • it’s not easy to set up
  • it’s intrusive, uncomfortable and a bit heavy

For more details on wearable equipment check this paper from the Helsinki Institute for Information Technology.

2. Screen capture applications

Screen capture applications involve installing software on a computer and on the phone. A connection can then be established between them so that it’s possible to follow on the computer screen what’s happening on the phone screen.

Screen capture applications provide high quality screen recordings, but:

  • No application will work with all mobile platforms.
    • Mobiola is available for some versions of Symbian S60 and Blackberry 4.2 or later
    • Display Recorder works on most iOS devices (not on iPhone 3G though), but you must jailbreak them first
    • Ovo Studios offers screen recording software for iOS devices without jailbreaking them
    As far as I know, no screen recording applications exist for Android yet.
  • Participants will not appreciate you installing “stuff” on their phones. You might be constrained to run tests on lab phones, instead of using your participants’ own devices.
  • As observed in this paper, screen capture applications will not show you the user interaction with the physical device (i.e. the fingers). This means you might miss some errors, like false negatives (undetected touches in touch screen devices).

3. Document cameras

Document cameras stand on a desk and are normally employed to record documents (hence their name).

This approach seems to be popular and has been documented by several people, including Google and Scott Weiss in his book Handheld Usability. However:

  • Document cameras are not cheap. An Elmo TT-02RX camera at B&H, a professional audiovisual equipment store in the USA, is priced at about USD 549. Shipping to the UK would bring the total price to USD 842.65 (prices from www.bhphotovideo.com accessed on January 2010). For a more affordable alternative, check Ipevo's cameras.
  • Participants must keep within the camera range, normally marked with a square of tape on the desk where the camera stands. This adds cognitive load to the tasks, and deviates participants’ attention from the object of the test.
  • The phone must lay on the desk or being hold at a flat angle, far from our natural posture when using mobile phones, which we held one or 2-handed at a 45% degree angle.

4. Mounted devices

Mounted cameras on mobile phones come in 2 flavours: you can buy them ready made, but you can also build them yourself.

Mounted devices allow natural interaction with the phone, but:

  • They are not cheap to buy. For example, in his blog 90 percent of everything Harry Brignull mentions a quote for £750 from a company in the UK.
  • They are messy to build. For example, in the Little Springs Design blog, Barbara Ballard explains how they must swap out lenses and use a quad-processor that “companies are no longer interested in making”
  • If bulky, they can prevent single-handed use of the phone.
  • If heavy, they can become tiring for participants.

The perfect recording set up

It seemed to us none of the approaches above ticked all the boxes. In our opinion, these are the characteristics of the perfect recording set up, and how the above approaches comply with them:

Recording-methods-matrix

Of the 4 approaches, mounted devices, if well implemented, seemed the most promising. Unfortunately, there was nothing we could do about the price of the ready made solutions, so we got to work on a DIY mounted device that was easy to put together and repeatable (i.e. can be easily rebuilt if damaged or lost).