How to Select a UX Metric – MeasuringU


feature image showing a man holding a clipboard with three graphsEarlier, we mapped out 70+ UX metrics. They are a mix of study-level and task-based metrics.

But with so many metrics, what’s the best way to select the right one?

Before we dig into strategies for selecting the right UX metrics, it’s helpful to recall what the purpose of UX measurement (and UX metrics) is and is not.

  1. Metrics are not error-free. Even temperature, weight, and distance are subject to the flaws and imperfections in the measurement device and how it obtains its measure.
  1. Metrics are not crystal balls. Don’t expect a perfect correlation between current metrics and future outcomes.
  1. Metrics don’t (necessarily) tell you what to do. Low task-completion rates will alert you to a problem, but they won’t necessarily tell you the cause and cure.

Instead, you should look to UX metrics to:

  1. Quantify the impact a design has on an experience. Are design changes actually making an impact? How do you know? What measurable evidence is there? There’s a strong need to show how design efforts improved the experience.
  1. See whether designs improved. If a measure is sensitive to design changes, then you can determine whether the experience is getting better or worse.
  1. Compare your experience objectively to competitors. Sometimes “better” is relative. While your design may have improved, if it’s still the worst amongst your competition, then it’s hardly time to stop improving. See the industry reports on our website.
  1. Compare your experience to industry standards. While you can’t always test your competition, many metrics have known benchmarks that will tell you what’s average, below average, or best in class. For example, based on a sample of ~500 products, a System Usability Scale (SUS) score of 68 is average, and getting in the top 10% takes a score of 80.
  1. Get some idea about what might happen. Testing design prototypes before going live is often a quick way to see whether things will improve. For example, do changes in Terms and Conditions or the introduction of a new plan result in more cancellations or signups? Prototypes aren’t a substitute for the live product experience, but assuming your testing is realistic, it provides a good idea of what’s to come.

It’s not choosing between a measure or an observation; it’s choosing the right measures for your observations.

With that in mind, what’s the right metric? Here are five steps to help you decide.

1. Define the Need:
What are you trying to measure?

When it comes to measuring, our recommendation is to be specific and supplementary.

Be Specific

Are you trying to measure an attitude (what people think or feel) or an action (what people do)?

Attitudes are notoriously fuzzy. Narrow those down as much as you can. Need to measure a sentiment? A sentiment of what toward what?

Want to measure quality? Quality of what? Stakeholders less familiar with UX will often use terms like user-friendliness. That should be decomposed into more specific metrics, such as perceptions of ease and usefulness or task-level metrics (task-completion times and successful task-completion rates).

Be Supplementary

Because user experience is affected by emotions (attitudes) and behaviors (actions), it is important to measure both. This usually includes measuring at least one attitude (e.g., perception of ease) and at least one action (e.g., completion rate). It may also mean using a mix of post-task measures (e.g., task time) and post-study measures (e.g., the UX-Lite® or SUS). When it comes to measuring, don’t be stingy—be supplementary.

2. Find a Match:
Is there a metric that maps well to what you intend to measure?

Try to adopt or adapt an existing metric instead of creating a new one. If you’re looking to measure how useful people find a product experience, then the UX-Lite’s usefulness item and the TAM’s six-item usefulness score have both been validated. If you’re looking to measure intent/likelihood to recommend, it makes sense to use the standard NPS Likelihood to Recommend item. If you can’t find a matching metric, you’ll have to create and validate your own, which is not a trivial task.

3. Prefer Popular Metrics:
Are other UX professionals and stakeholders familiar with the metric?

Familiarity helps with stakeholder buy-in. If multiple metrics meet your needs, when all else is the same, it’s best to use well-known metrics to enhance ease of communication and interpretation. For example, most UX professionals are familiar with the SUS and the measurement of task times and successful task completions.

4. Prefer Easier Metrics:
How hard is it to use the metric?

Once you’ve identified metrics that match what you intend to measure, look to see how easy or difficult data collection is. Does it require special software, a license, or hardware?

For example, seeing the percentage of people who notice a design element (the fixation) requires eye-tracking hardware and software plus in-person data collection. Not easy, but for the right research question, it’s indispensable.

5. Prefer Benchmarked Metrics:
Can you interpret the metric?

Look for metrics with known benchmarks. Why? Because benchmarks answer these questions:

    1. Compared to what?
    2. What’s good? What’s bad?
    3. Are there benchmarks or a normed database?

When you have benchmarks to interpret the values of metrics as poor, average, or good, you can conduct single-product studies and show stakeholders where the product is on the benchmark. When comparing multiple products, you can use standard statistics to show whether there are statistically significant differences in metrics; with benchmarked metrics, you can also show whether the statistically significant differences have practically significant consequences.

Figure 1 shows the infographic we previously developed to categorize a large number of UX metrics.

Infographic table showing an overview of 70+ UX Metrics

Figure 1: Overview of 70+ UX metrics. Note: This taxonomy is a living document. Like anything else, the popularity of metrics can rise and fall, new methods may make formerly difficult metrics easier to collect, and metrics that do not have benchmarks this year might have some next year. We plan to update this infographic over time, so in the future, you can click here for the latest version.

The design of the infographic supports the five steps to metric selection:

  1. Define the need. The infographic doesn’t contain all possible UX metrics (which isn’t possible), but it lists a lot of them and, at a high level, provides a comprehensive list of research needs. To make sure you’re not missing an important measurement need, browse the higher-level categories.
  1. Find a match. Work down through the higher-level categories that match your needs to the specific metrics. If there’s only one that applies to your research, your search is done. If there are several candidates, continue to the next three steps.
  1. Prefer popular metrics. You can identify the metrics we’ve classified as highly popular by looking for the green triangle. Yellow indicates a medium level of popularity, and red a low level. We assigned these ratings based on our literature reviews and professional experience.
  1. Prefer easier metrics. For the easiest metrics to collect, look for the green circles (yellow indicates medium ease; red indicates a metric that is difficult to collect). Our criteria in classification for this dimension included whether a metric was a simple questionnaire with a small number of items (up to 20), which would be green, to metrics that require specialized equipment and training (e.g., eye tracking), which would be red.
  1. Prefer benchmarked metrics. The metrics with the best benchmarks are designated with green squares. By “best,” we mean benchmarks that have been validated, published, and presented without a license. A metric with benchmarks but requires a license for access has a yellow box. If sources for independently interpreting a metric are limited or of questionable quality, the box is red. A gray box indicates no known benchmarks.

Part of the challenge of selecting metrics is balancing the importance in your research context of the final three steps—popularity, ease, and benchmarks.

For example, consider the pros and cons of using the SUPR-Q® to measure website quality. It’s well-known and easy to administer, so green and green for those criteria. Although using its eight items in a study is free with attribution, its percentile scoring requires access to a proprietary database (the license fees pay for its continuing data collection and updating), so the benchmark criterion is yellow. An alternative to consider is the WAMMI, which also has proprietary benchmarks but is longer than the SUPR-Q (20 items in its short form) and isn’t quite as well-known. Selecting the SUPR-Q is an easy choice for us because MeasuringU created and maintains its benchmarks.

As an example of a more specialized metric, suppose you needed a questionnaire that assesses the quality of synthetic speech. The Mean Opinion Scale (MOS) is well-known and easy to administer but lacks well-documented benchmarks. The Mean Opinion Scale Expanded (especially the MOS-X2) has well-documented benchmarks and is easy to administer but is not as well-known as the original MOS. Given its benchmarks, we prefer the MOX-X2.

Moving on from study-level metrics (which are all questionnaires), which is the best choice for a measure of effectiveness in task-based studies? The alternatives are completion, findability (a specific type of completion rate), and errors. Of course, you could always collect more than one measurement of effectiveness, but considering popularity, ease, and benchmarks, our first choice is the task-completion rate.

What if you need to measure how long it takes someone to find a target link on a website? You’ll need to choose one of the eye-tracking metrics. For visual attention, the candidates are dwell time, fixation count, and time till first fixation. Of these, the only one that matches the research need is time till first fixation.

As shown in the infographic, all the behavioral and physiological metrics float in a sea of red—metrics not well known in UX research, all difficult to collect, and with no well-established benchmarks for their interpretation. We only use these when absolutely necessary to meet a client’s research objectives, and when we can, we supplement them with high-quality task-based and attitudinal metrics.

Table 1 illustrates the path to the metric decisions described in these examples.

What to Measure Candidate Metrics Selected Metric Selection Criteria
Website quality SUPR-Q, WAMMI SUPR-Q Access to benchmarks, easier (fewer items), slightly better known
Speech quality MOS, MOS-X2 MOS-X2 Availability of benchmarks, easier (fewer items)
Effectiveness Completion, Findability, Errors Completion Best known, better benchmarks than other candidates, easier to collect/analyze than errors
Time to find target Dwell time, Fixation count, Time to first fixation Time to first fixation Only eye-tracking metric that matches the research need

Table 1: Examples of paths to metric selection decisions.

Metrics aren’t perfect, but despite their issues, they are critical in quantitative UX practice. In this article, we showed how to use an infographic to guide UX practitioners along five steps for picking the right UX metric: define the need, find a match, prefer popular metrics, prefer easier metrics, and prefer benchmarked metrics. The last three steps are used when multiple metrics match a defined research need. These steps do not determine which metric to use. Instead, they help guide UX practitioners to select an appropriate metric for their specific research context.

We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0