Wednesday, February 5, 2025
HomeAnalyticsHow to Weight Means – MeasuringU

How to Weight Means – MeasuringU


header image showing the scalesIn a previous article, we discussed the pros and cons of using weights to compensate for differences between a sample and a reference population.

Due to its risks, the consensus about weighting is that it’s a method of last resort when (1) it’s critically important for proportions of sample groups to match a reference population and (2) problems in a sampling plan have caused it to fail to acquire the proper proportions.

Weighting to correct disproportionate sampling is not advised unless (1) there is an appropriate reference population, (2) there are actual differences in the group proportions of the reference population and the sample, (3) the study measurements are affected by variables in the reference population, and (4) group sample sizes are large enough to produce stable estimates.

When you’ve determined you should weight your data, one of the first applications of weighting is the analysis of means.

In this article, we cover how to weight means in the basic case of matching against a single reference variable.

Means can be weighted at the group level or the case (individual participant) level. The process of group-level weighting is simpler than at the case level. The advantage of case-level weighting is it enables more complex analyses such as weighted regression, but if there are no plans for such complex analyses, group-level weighting will usually suffice.

Group-Level Weighting

Prior experience with a product is one of the most consistently strong predictors of differences in UX metrics. People with more experience with a product tend to rate it higher than those with low to no experience. This can be seen, for example, with the correlation between SUS scores and duration of product experience.

Consequently, weighting by experience levels is a common and legitimate use of weighting in UX research. Consider a hypothetical dataset of one hundred SUS cases, representing a typical survey or unmoderated study for an accounting software package.

As summarized in Table 1, the sample consists of 25 participants with low experience (three or fewer years of experience with the software) and 75 participants with high experience (more than three years of experience). But the distribution in the sample differs from the reference population, with percentages determined from a very large sample of customer data (75% low experience, 25% high experience).

Experience SUS n Sample Reference
Low 70.8 25 0.25 0.75
High 76.9 75 0.75 0.25

Table 1: Example group means for low and high levels of experience showing sample and reference proportions.

To weight the overall mean to the reference population, simply multiply each group average by the reference proportion, then add them together. When you use this method, the weights should sum to 1 (.75 + .25 = 1). The overall sample size for the weighted mean is the sum of the group sample sizes, 25 + 75 = 100.

weightedMean = .75(70.8) + .25(76.9) = 53.1 + 19.225 = 72.325

In contrast, the unweighted mean at the group level is 73.85 (a simple average of 70.8 and 76.9), about a point and a half higher than the weighted mean because the higher SUS score came from the smaller group in the reference population.

Case-Level Weighting

To weight at the case level, divide the reference proportion by the sample proportion.

For this example, the weights for low-experience cases would be .75 / .25 = 3 and for high-experience cases would be .25 / .75 = 1/3. This has the effect of increasing the influence of low-experience cases and decreasing the influence of high-experience cases on the weighted mean while keeping the sum of the 100 case weights equal to the original sample size (Low = 25(3) = 75; High = 75(1/3) = 25; 75 + 25 = 100).

We know from Table 1 that the Low mean is 70.8 with a sample size of 25, and the High mean is 76.9 with a sample size of 75. If we computed an unweighted average across those 100 cases, then the mean would be (70.8(25) + 76.9(75)) / 100 = 75.375.

With case weights applied, the mean would be (70.8(25)(3) + 76.9(75)(1/3))/100 = (5310 + 1922.5) / 100 = 72.325—the same as the result obtained by weighting the group means with the reference proportions.

We can use the formula above to compute the case-weighted mean because the same weight was applied to each case in a given group. Here’s the detail:

Low-Group Cases

3(Low1) + 3(Low2) … + 3(Low25) = 3(Low1 + Low2 … + Low25)

= 3(25)(( Low1 + Low2 … + Low25)/25) = 3(25)(LowGroupMean)

High-Group Cases

(1/3)(High1) + (1/3)(High2) … + (1/3)(High75) = (1/3)(High1 + High2 … + High75)

= (1/3)(75)((High1 + High2 … + High75) / 75) = (1/3)(75)(HighGroupMean)

Case-Weighted Mean

CaseWeightedMean = (3(25)(LowGroupMean) + (1/3)(75)(HighGroupMean))/(SumOfWeights)

= ((3)(25)70.8 + (1/3)(75)(76.9)) / 100 = 72.325

Note that the best estimates of the means for the low- and high-experience groups are those shown in Table 1. The effect of weighting is meaningful only for adjusting the overall mean to better match the reference population and works well only when there is a good reference population from which to obtain weights.

Suppose instead of just two levels of experience, there were the six shown in Table 2 (group means and sample proportions computed from the sample data available for download in the Appendix, reference population proportions made up for this example).

Experience SUS n Sample Reference Case Weight
  1–6 66.5  6 0.06 0.38 6.3333
 7–12 69.6  5 0.05 0.17 3.4000
13–24 72.0  5 0.05 0.11 2.2000
25–36 73.6  9 0.09 0.08 0.8889
37–48 75.4 13 0.13 0.09 0.6923
49–60 77.2 62 0.62 0.17 0.2742

Table 2: Data for six experience groups (experience levels are for months of subscription to accounting software).

Group-Level Weighting

For this example, the most egregious mismatches are at the lower and upper experience levels where, according to the accounting software company’s records, 38% of their subscribers have used the software for 1–6 months, and 17% of subscribers have used the software for 4–5 years, but their representation in the sample was, respectively, 6% and 62%.

Using the reference proportions, the weighted average is:

66.5(.38) + 69.6(.17) + 72.0(.11) + 73.6(.08) + 75.4(.09) + 77.2(.17) = 70.8

Even though almost 2/3 of the sample was in the most experienced group with a mean SUS of 77.2, the group-weighted mean of 70.8 was much closer to the mean of the second-to-lowest group (69.6).

Case-Level Weighting

First, let’s check to see if the sum of the 100 weights equals 100.

6(6.3333) + 5(3.4000) + 5(2.2000) + 9(0.8889) + 13(0.6923) + 62(.2742) = 100

The sum of the weights is 100, so the case-weighted mean is:

(6(6.3333)(66.5) + 5(3.4000)(69.6) + 5(2.2000)(72.0) + 9(0.8889)(73.6) + 13(0.6923)(75.4) + 62(.2742)(77.2)) / 100 = (2527 + 1183.2 + 792 + 588.8074 + 678.5925 + 1312.4309) / 100 = 70.8

The case-level process has more steps than the group-level process, but the result is the same.

In this article, we focused on the most common situation that UX researchers encounter—when there is a need to use weights to adjust a sample mean to better approximate a reference population (matching against a single variable).

Consistent with our examples, some analysts recommend using weights that sum to the sample size while other analysts recommend limiting the range of weights to no less than 0.5 and no more than 2.0. The rationale for adjusting very large or very small weights is that it is risky to over- or under-weight cases, especially when the sample size for a group is small, which affects the precision of the group mean. This would lead, however, to reducing the sum of the weights so they no longer match the sample size, and the case-weighted mean would no longer match the group-weighted mean, so we prefer to work with unadjusted weights.

If this were a real example, we would definitely recommend investigating why the discrepancy was so large between the sample and the reference population, and we would advise caution using these results for any important business decisions until the sampling problem gets fixed.

Finally—what should you do if you are trying to match a sample to a reference population against more than one variable? There are methods for doing this, but they are more complicated and require specialized software. We’ll cover that in a future article.

This appendix provides a link to download the hundred cases used in the examples in this article so interested readers can replicate our analyses using their preferred statistical package. The data is available for download at:

Weighting Exercise Sample Data

In the sample data, Group6 designates the six levels of product experience in months; Group2 has just two groups where the Low group is made up of the 25 cases from the first four groups in Group6, and the High group is made up of the 75 cases from the remaining two groups in Group6.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments

Skip to toolbar