
You’re 25% complete. Still a ways to go.
You got a score of 90 out of 100 on a math test. Not bad.
You got 1475 on the SAT—the 95th percentile. Awesome!
Only 40% of users completed the task. Not great.
The average score on a seven-point scale was 5.2. Hmm. Is that good?
One of the more challenging parts about using metrics is making them meaningful. This is especially the case for scores unfamiliar to your stakeholders.
When presented with new metrics, common questions are:
- Is higher better?
- What’s the highest possible score? The lowest?
- What’s good, bad, or average?
- Can you have negative scores?
Absolute percentages are used commonly across many domains in business and science. They have a familiarity and intuitiveness because seeing a percentage provides answers to many of the above questions. Higher is better; the lowest score is 0% and the highest is 100%; you can’t have negative scores.
But many useful metrics don’t lend themselves well to percentages. For example, rating scales are one the most popular (and effective) ways to gauge someone’s sentiment about an interface or experience.
Multipoint scales, though, are rarely presented as 0 to 100-point scales (which we refer to as 100-point scales for the rest of this article), with most standardized UX questionnaires having five or seven points. For example, the SEQ® has one seven-point scale, the SUS uses ten five-point scales, and the SUPR-Q® uses seven five-point scales plus one eleven-point likelihood-to-recommend scale. To make raw rating scale scores more interpretable, they can be transformed into 100-point scales or compared to a reference database to get percentile scores.
Absolute percentages, percentiles, and linear transformations all convert raw data to scores that range from 0 to 100. Sounds good, but while they have the same scale range, their interpretations are slightly different, which can lead to confusion. Here are the differences between absolute percentages, percentiles, and rating scale transformations to 100-point scales.
As we discussed in a previous article, it’s common in UX research to encounter three types of percentage: absolute, relative, and net.
Absolute Percentages
Of the different types, absolute percentages are the most common and the easiest to understand. An absolute percentage is the ratio of the number of critical events divided by the number of opportunities for the events to occur. For example, if five participants in a usability study attempt a task and four are successful, the ratio is 4/5, which equals .8, which, multiplied by 100, is 80%.
When a critical event is a good thing (e.g., measuring success rates rather than failure rates), a larger percentage is better than a smaller percentage. Interpreting absolute percentages as poor or good, however, requires knowledge of the context in which they were collected. For example, an 80% successful completion rate is better than a 50% successful completion rate, but it isn’t good enough if the successful completion rate for your competitors is 95%.
Only absolute percentages are relevant to this article because, unlike the other types of percentage, absolute percentages range only from 0 to 100%. To illustrate their differences, we briefly describe relative and net percentages below. (For more details about these, see the previous article.)
Relative Percentages
Relative percentages are based on the ratio between two numbers, one of which is a reference point (e.g., if an initial seven-point task ease item was 5.0 and, after redesigning, the task rose to 5.5, the relative percentage of improvement is (5.5 − 5.0)/5.0 = .5/5.0 = .1 = 10%). Relative percentages can be positive or negative, and because there are no limits on the difference in the sizes of the reference and new values, the possible magnitudes of relative percentages are essentially unlimited.
Net Percentages
Net percentages are the difference between two absolute percentages (e.g., the percentage of respondents selecting Strongly Agree on a rating scale minus the percentage selecting Strongly Disagree). Net percentages can range from −100% to +100%. (Yes, the NPS can be expressed as a percentage, just like any other net percentage.)
Percentiles indicate a score’s relative standing compared to a set of reference data. For example, a 50th percentile score on the SUPR-Q means the score is average (half of the scores in the reference data are better and half are worse). The 80th percentile means the score is higher than 80% of the scores in the reference database. So, percentiles can provide context for interpreting any type of score, including rating scales and percentages. For example, percentiles are the basis of the Sauro-Lewis curved grading scale for the SUS (Table 1).
SUS Score Range | Grade | Percentile Range |
---|---|---|
84.1–100 | A+ | 96–100 |
80.8–84.0 | A | 90–95 |
78.9–80.7 | A− | 85–89 |
77.2–78.8 | B+ | 80–84 |
74.1–77.1 | B | 70–79 |
72.6–74.0 | B− | 65–69 |
71.1–72.5 | C+ | 60–64 |
65.0–71.0 | C | 41–59 |
62.7–64.9 | C− | 35–40 |
51.7–62.6 | D | 15–34 |
0.0–51.6 | F | 0–14 |
Table 1: Curved grading scale for the SUS.
It isn’t always clear how values are converted to percentiles because the reference dataset may not be available (e.g., the SUS is based on 446 industrial usability studies, many of which are anonymized) or may be proprietary (e.g., SUPR-Q).
Even though percentiles are essentially a type of percentage (the location of a score determined by its position against a set of reference data), referring to the position as a “percentile” or “%ile” is customary to distinguish it from the other types.
Before using percentiles to interpret scores, make sure the scores you want to interpret are consistent with the reference data. For example, we have three sets of reference data for assigning percentiles to UX-Lite® scores: business software, consumer software, and consumer websites. Even with three sets of reference data, some types of specialized software are not consistent with any of them (e.g., integrated development environments).
100-point scores can be either raw scores from nearly continuous scales (e.g., sliders or click scales) or linear transformations of scores from questionnaires that use multipoint rating scales (e.g., five- or seven-point scales). The SUS, UMUX, and UX-Lite are all examples of multipoint rating scales transformed to 100-point scores.
Why choose 0 to 100? In John Brooke’s 2013 retrospective article describing his development of the SUS, he wrote, “Project managers, product managers, and engineers were more likely to understand a scale that went from 0 to 100 than one that went from 10 to 50, and the important thing was to be able to grab their attention in the short space of time they were likely to spend thinking about usability, without having to go into a detailed explanation.” This is consistent with our experience as UX practitioners.
For example, the UX-Lite (Figure 1) is presented to participants using two items.
Figure 1: The UX-Lite.
Once you collect data from participants, convert the raw scores. Interpolate each item from a five-point scale to a 100-point scale for easier interpretation of the Ease and Usefulness scores, then average those two scores to get the UX-Lite. (These calculations are done automatically in our UX-Lite calculator.)
Note that linear interpolations like this affect the magnitudes of mean values and their standard deviations but do not affect other important statistical properties (e.g., correlations with other metrics, beta weights in regression).
A shortcut to calculate the interpolated score for any five-point item is to subtract 1 from the rating and then multiply by 25. For example, if someone gives an ease rating of 4 and a usefulness rating of 3, then:
- Ease = (4 − 1)25 = 3(25) = 75
- Usefulness = (3 − 1)25 = 2(25) = 50
- UX-Lite = (75 + 50)/2 = 125/2 = 62.5
To avoid unnecessary confusion, we don’t refer to these values as percentages. They could, however, be conceptualized as the percentage of the distance of a score from the starting point of a rating scale relative to the length of the scale. For example, suppose the mean of a five-point rating scale is 3.8 as shown in Figure 2.
Figure 2: Conceptualization of the location of a score on a five-point scale as a percentage.
The distance from the starting point to the score is 2.8 (3.8 − 1). The total distance of the line is 4.0 (5 − 1). The ratio of these values is 2.8/4.0, which expressed as a proportion is .7, and expressed as a percentage is 70%. Note that if we use the shortcut formula above for five-point scales we get (3.8 − 1)25 = 70.
We provide this demonstration to show how one might justify presenting these values as percentages, but we do not recommend doing so. It’s confusing enough to deal with the three types of percentage that all use the % sign.
When it comes to interpretation, 100-point scores are similar to absolute percentages. Higher scores are usually better than lower scores, but without more context, you can’t tell which scores are poor or good.
Below is a table comparing the different scales discussed in this article. All have their role in UX measurement, but all are interpreted a bit differently, so be sure you know if any score you’re trying to interpret is an absolute percentage, a percentile, or a rating scale score transformed to a 100-point scale.
Measurement Property | Absolute Percentage | Percentiles | 100-Point Scales |
---|---|---|---|
Symbol | % | %ile | No symbol |
Min score 0 | Yes | Yes | Yes |
Max score 100 | Yes | Yes | Yes |
Can be negative | No | No | No |
Provides context-free info about standing | No | Yes | No |
Table 2: Summary of the properties of different 100-point scales.