As we saw in the previous post, the sample median is biased when sampling from skewed distributions. The bias increases with decreasing sample size. According to Miller (1988), because of this bias, group comparison can be affected if the two groups differ in skewness or sample size, or both. As a result, real differences can be lowered or increased, and nonexistent differences suggested. In Miller’s own words:
“An important practical consequence of the bias in median reaction time is that sample medians must not be used to compare reaction times across experimental conditions when there are unequal numbers of trials in the conditions.”
Here we will see that this advice is wrong for two reasons.
We assess the problem using a simulation in which we draw samples of same or different sizes from populations that differ in skewness or not. We use the same 12 distributions used by Miller (1988), as described previously.
Group 2 has size 200 and is sampled from the distribution with the least skewness.
Group 1 has size 10 to 200, in increments of 10, and is sampled from the 12 distributions.
After 10,000 iterations, here are the results for the mean:
All the bias values are near zero, as expected. The shaded area shows the upper part of the mean’s bias 50% HDI (highest density interval), when group 1 and group 2 have the same, least, skewness (6). This interval shows the location of the bulk of the 10,000 simulations. The same area is shown in the next figures. Again, this is a reminder that bias is defined in the long run: a single experiment (one of our simulation iterations) can be far off the population value, especially with small sample sizes.
Here are the results for the median:
As in the previous figure, the shaded area shows the upper part of the mean’s bias 50% HDI, when group 1 and group 2 have the same skewness. Bias increases with skewness and sample size difference and is particularly large for n = 10. However, if the two groups have the same skewness (skewness 6), there is almost no bias even when group 2 has 200 observations and group 1 only has 10. So it seems Miller’s (1988) warning about differences in sample sizes was inappropriate, because the main factor causing bias is skewness.
Next, let’s find out if we can correct the bias. Bias correction is performed in 2 ways:
 using the bootstrap

using subsamples, following Miller’s suggestion.
Miller (1988) suggested:
“Although it is computationally quite tedious, there is a way to use medians to reduce the effects of outliers without introducing a bias dependent on sample size. One uses the regular median from Condition F and compares it with a special “average median” (Am) from Condition M. To compute Am, one would take from Condition M all the possible subsamples of Size f where f is the number of trials in Condition F. For each subsample one computes the subsample median. Then, Am is the average, across all possible subsamples, of the subsample medians. This procedure does not introduce bias, because all medians are computed on the basis of the same sample (subsample) size.”
Using all possible subsamples would take far too long. For instance, if one group has 5 observations and the other group has 20 observations, there are 15504 (choose(20,5)
) subsamples to consider. Slightly larger sample sizes would force us to consider millions of subsamples. So instead we compute K random subsamples. I arbitrarily set K to 1,000. Although this is not what Miller (1988) suggested, the K loop shortcut should reduce bias to some extent if it is due to sample size differences. Here are the results:
The K loop approach has only a small effect on bias. The reason is simple: the main cause of the bias is not a difference in sample size, it is a difference in skewness. This skewness difference can be handled by the bootstrap. Here is what we get using 200 bootstrap resamples for each simulation iteration:
The bootstrap bias correction works very well. The median’s bias is still a bit larger than the mean’s bias for n = 10 in some conditions. For instance, for the most skewed distribution and n = 10, the mean’s bias is 0.51, whereas the median’s bias after bias correction is 0.73. None of them are exactly zero and the absolute values are very similar. At most, for n = 10, the median’s maximum bias across distributions is 1.85 ms, whereas the mean’s is 0.7 ms.
Conclusion
Miller’s (1988) advice was inappropriate because, when comparing two groups, bias originates from a conjunction of differences in skewness and in sample size, not from differences in sample size alone. Although, in practice it might well be that skewed distributions from different conditions/groups tend to differ in skewness, in which case unequal sample sizes will exacerbate the bias. To be cautious, when sample size is relatively small, it would be useful to report median effects with and without bootstrap bias correction.
Finally, data & code are available on github.
[GO TO POST 3/4]
Pingback: Reaction times and other skewed distributions: problems with the mean and the median (part 3/4)  basic statistics
Pingback: Reaction times and other skewed distributions: problems with the mean and the median (part 1/4)  basic statistics