The mobile app industry is very lucrative, having generated $45 billion in the U.S. and $253 billion worldwide in 2023. A good user experience is critical to the success of websites and mobile apps.
We’ve shown how perceived ease of use and perceived usefulness impact people’s intention to use and recommend software products. We’ve also shown how those intentions predict actual behaviors, demonstrating a chain of influence from UX metrics to important business outcomes.
Mobile apps are a distinct type of product that warrant their own measurement, but work on standardized mobile app questionnaires has been limited.
In this article, we review how mobile UX has historically been measured and then review three mobile-specific questionnaires: the MPUQ, mod-AUG scales, and SUPR-Qm®.
Lusky and Boehm (2017) reviewed the methods used to evaluate mobile UX. They categorized methods as generic (applicable to a wide range of user experiences, not just mobile), mobile-adapted (originally generic but adapted to mobile evaluation), and mobile-specific (developed for evaluation of mobile UX).
Applying this taxonomy to standardized UX questionnaires, we find numerous examples of researchers using generic UX questionnaires in research on the UX of mobile apps. For example:
- van der Heijden and Sangstad Sørensen (2003) used a standardized consumer acceptance questionnaire, the Hedonic Utilitarian (HED/UT) scale, to measure attitudes toward mobile information services.
- Dhir and Al-kahtani (2013 [PDF]) used the AttrakDiff questionnaire to evaluate the UX of mobile augmented reality prototypes.
- O’Malley et al. (2014) used the Software Usability Measurement Inventory (SUMI) to study a mobile app for adolescent obesity management.
- Kortum and Sorber (2015) used the System Usability Scale (SUS) to assess mobile applications for phones and tablets.
We know of three standardized questionnaires that were developed specifically to assess the UX of mobile apps. Two used the methods of classical test theory (CTT)—the Mobile Phone Usability Questionnaire (MPUQ) and an unnamed questionnaire for the assessment of mobile app usability published by Hoehle and Venkatesh (2015) that we will refer to as the modified Apple UX Guideline (mod-AUG) scales. The third one is the SUPR-Qm, which was developed at MeasuringU using item response theory (IRT, specifically, Rasch scaling) rather than CTT.
The MPUQ
The MPUQ is a multidimensional instrument with six subscales (ease of learning and use, assistance with operation and problem solving, emotional aspect and multimedia capabilities, commands and minimal memory load, efficiency and control, and typical tasks for mobile phones) measured with 72 items. The subscales were identified using factor and item analysis (construct validity). The reliabilities of the subscales were acceptable with coefficient alpha ranging from .82 to .93 (.96 overall).
For the full MPUQ, see the original article. Some examples of the items for each subscale are:
- Ease of Learning and Use (LEU): “Is it easy to learn to operate this product?”; “Is discovering new features sufficiently easy?”
- Helpfulness and Problem-Solving Capabilities (HPSC): “Is the HELP information given by this product useful?”; “Does the HELP function define aspects of the product adequately?”
- Affective Aspect and Multimedia Properties (AAMP): “Is using this product frustrating?”; “Does carrying this product make you feel stylish?”
- Commands and Minimal Memory Load (CMML): “Is the organization of the menus sufficiently logical?”; “Are the HOME and MENU buttons sufficiently easy to locate for all operations?”
- Control and Efficiency (CE): “Are the response time and information display fast enough?”; “Is the product reliable, dependable, and trustworthy?”
- Typical Task for Mobile Phone (TTMP): “Is it easy to correct mistakes such as typos?”; “Is it easy to change the ringer signal?”
The Mod-AUG Scales
As part of an information systems investigation into the usability of mobile apps and the influence of that construct on user attitudes and intention to use, Hoehle and Venkatesh developed a questionnaire with an initial set of 120 items based on the 2012 version of Apple’s UX guidelines (the AUG scales). The final questionnaire retained 78 of those items, hypothesized to measure 19 low-level constructs that in turn measured six high-level constructs (application design, application utility, user interface graphics, user interface input, user interface output, and user interface structure). For research purposes, participants also responded to 24 additional items related to the high-level constructs and 11 items related to two outcome constructs (likelihood to use, loyalty), for a total of 102 items.
The reliability of the scales (coefficient alpha) ranged from .75 to .85. Fit statistics were acceptable for the construct validity (measurement model) of the questionnaire assessed with confirmatory factor analysis. A structural equation model that included the questionnaire and outcome constructs also had acceptable fit statistics (CFI = .96, RMSEA = .04), accounting across initial and cross-validation datasets for 41% to 47% of the variance in the intention to keep using and 16% to 19% of the variance in loyalty.
For the full questionnaire, see the original paper. Examples of the items from the six high-level constructs are:
- Application Design (DES): “Overall, I think the mobile application is designed well.”; “I am very satisfied with the overall design of the mobile application.”
- Application Utility (PURP): “To me, the mobile application is very functional.”; “In general, the mobile application is of value to me.”
- User Interface Graphics (INTG): “Overall, I think the graphics displayed on the mobile application are designed effectively.”; “Overall, the mobile application has very good user interface graphics.”
- User Interface Input (INP): “In general, the mobile application allows me to input data easily.”; “Generally speaking, it is easy to type in data into the mobile application.”
- User Interface Output (CONT): “In general, the content of the mobile application is presented effectively.”; “I am very satisfied with the way that the mobile application presents content.”
- User interface structure (STRU): “Overall, I think the mobile application structures information effectively.”; “Generally speaking, the mobile application is structured nicely.”
The SUPR-Qm
The steps in the development of the SUPR-Qm were initial item creation, item refinement, and identification of the final item set (documented in detail in a 2017 paper published in the Journal of User Experience).
Initial Item Creation
There were 23 positive-tone items in the initial set, covering the utility, usability, intended usage, and reasons for deleting apps, plus four free-response questions. The ratings of those items from a sample of 104 Amazon Mechanical Turk participants indicated that two were not sufficiently applicable. Analysis of responses to the free-form questions informed the generation of additional items, bringing the number of items up to 34.
Item Refinement
The second study evaluated the properties of 34 items using data from 341 Mechanical Turk respondents. Respondents were assigned to one of three groups: (1) selected the app they used the most out of a list of 15; (2) selected an app from the list that they didn’t use much but was still on their phone, and (3) rated the app they had used most recently.
Principal component analysis of the items indicated multidimensionality rather than the unidimensionality required for a Rasch scale. Seven items were removed for having excessively high infit or outfit values using the criterion of MNSQ values greater than 3. A similar analysis was used to remove ten respondents from the sample.
Final Item Set
Additional studies identified items that would be harder for respondents to agree with (important to retain to measure higher levels of UX) and removed excessively redundant items (important to exclude to improve measurement efficiency). Evaluation of the remaining 16 items (from 284 Mechanical Turk respondents divided into the same groups as in the second study) produced the Wright map in Figure 1. Table 1 shows the wording for each item referenced in Figure 1.
Figure 1: Wright map of the final item set of the original SUPR-Qm.
Label | Item Wording |
---|---|
CantLiveWithout | I can’t live without this app on my phone. |
BestApp | The app is the best app I’ve ever used. |
CantImagine | I can’t imagine a better app than this one. |
NeverDelete | I would never delete the app. |
EveryoneHave | Everyone should have the app. |
Discover | I like discovering new features on the app. |
AllEverWant | The app has all the features and functions you could ever want. |
Delightful | The app is delightful. |
Integrates | The app integrates well with the other features of my mobile phone. |
UseFreq | I like to use the app frequently. |
DefUse | I will definitely use this app many times in the future. |
Attractive | I find the app to be attractive. |
FindInfo | The design of this app makes it easy for me to find the information I’m looking for. |
MeetsNeeds | The app’s features meet my needs. |
EasyNav | It is easy to navigate within the app. |
Easy | The app is easy to use. |
Table 1: Labels and text for the 16 items of the SUPR-Qm in order from most difficult to easiest for respondents to agree with.
In practice, these items are usually arranged in two grids of eight items each, with items randomly assigned to grids, then within grids randomly presented as standard five-point agreement items (1: Strongly disagree, 5: Strongly agree). Some labels in Table 1 differ slightly from the alternate labels shown in Figure 1, but the item wording was consistent across all studies.
But that’s not the end of the SUPR-Qm story. Since its release in 2017, MeasuringU has collected data from thousands of participants regarding their mobile app experiences across dozens of industries. We’ll present the resulting SUPR-Qm modifications in upcoming articles.
The mobile app industry is large, and there is significant competition for users. Despite their limitations in diagnosing specific usability problems, standardized questionnaires are important tools in the overall assessment of the UX of mobile apps. Only a few standardized UX questionnaires have been developed specifically for mobile apps—the MPUQ, the mod-AUG scales, and SUPR-Qm.
Questionnaires developed with CTT and IRT play different roles in UX evaluation. CTT tends to generate sets of items that are optimized around the average level of multiple constructs. IRT, by contrast, optimizes around a questionnaire that reliably measures a fuller range of a single construct, from low to high, not just around the average. Whether CTT or IRT is the better approach for developing a standardized UX questionnaire depends on the measurement goals.
Use the SUPR-Qm to get a quick measure of the overall UX of mobile apps. When the measurement goal is a unidimensional measure that is sensitive to a broad range of knowledge or experiences (as is the SUPR-Qm), the more appropriate questionnaire development method is IRT (including Rasch analysis).
If there is a need to measure additional constructs, use one of the longer multifactor questionnaires. When the goal is the measurement of multiple constructs (as in MPUQ and mod-AUG scales), the more appropriate questionnaire development method is CTT.
There are practical issues with the current multifactor questionnaires. The multifactor standardized questionnaires are good from a purely statistical perspective, but with 72 items for the MPUQ and 78 items for the mod-AUG scales, they are not practical for rapid assessment of the UX of mobile apps, plus there are no published norms for interpreting the resulting scores (overall and subscale). This is, in part, due to using the methods of CTT to develop the questionnaires, which is good at identifying multiple factors but requires at least two items per subscale (a few more is usually better) and a lot of research to develop and maintain interpretive norms for questionnaire scores. Furthermore, the more specific the item content, the more likely it is to become less relevant over time (e.g., from the MPUQ, “Are the HOME and MENU buttons sufficiently easy to locate for all operations?”).
The SUPR-Qm is a quick measure of overall mobile app UX, but it could be more efficient. Examination of the Wright map in Figure 1 shows opportunities to streamline the SUPR-Qm by removing redundant items. Redundant items are those that are located around the same place on the y-axis (having similar logit positions). For example, items close to the exact center of the scale (e.g., AllEverWant and Delightful) have essentially the same measurement properties, so it doesn’t matter which one is selected for inclusion in a streamlined version of the SUPR-Qm. A key research question is how many items could be excluded and still have a streamlined version that produces scores comparable to the original version.
Future research: In future articles, we will discuss research we’ve conducted to replicate the original SUPR-Qm findings, streamline the SUPR-Qm, verify the stability of full and streamlined versions of the SUPR-Qm, and develop norms for the interpretation of SUPR-Qm scores.
For more details about this research, see the paper we published in the Journal of User Experience (Lewis & Sauro, 2025).