As we saw in the previous post, the sample median is biased when sampling from skewed distributions. The bias increases with decreasing sample size. According to Miller (1988), because of this bias, group comparison can be affected if the two groups differ in skewness or sample size, or both. As a result, real differences can be lowered or increased, and non-existent differences suggested. In Miller’s own words:
“An important practical consequence of the bias in median reaction time is that sample medians must not be used to compare reaction times across experimental conditions when there are unequal numbers of trials in the conditions.”
Let’s evaluate this advice.
We assess the problem using a simulation in which we draw samples of same or different sizes from populations with the same skewness, using the same 12 distributions used by Miller (1988), as described previously.
Group 2 has size 200. Group 1 has size 10 to 200, in increments of 10.
After 10,000 iterations, here are the results for the mean:
All the bias values are near zero, as expected.
Here are the results for the median:
Bias increases with skewness and sample size difference and is particularly large for n = 10. At least about 90-100 trials in Group 1 are required to bring bias to values similar to the mean.
Next, let’s find out if we can correct the bias. Bias correction is performed in 2 ways:
- using the bootstrap
using subsamples, following Miller’s suggestion.
Miller (1988) suggested:
“Although it is computationally quite tedious, there is a way to use medians to reduce the effects of outliers without introducing a bias dependent on sample size. One uses the regular median from Condition F and compares it with a special “average median” (Am) from Condition M. To compute Am, one would take from Condition M all the possible subsamples of Size f where f is the number of trials in Condition F. For each subsample one computes the subsample median. Then, Am is the average, across all possible subsamples, of the subsample medians. This procedure does not introduce bias, because all medians are computed on the basis of the same sample (subsample) size.”
Using all possible subsamples would take far too long. For instance, if one group has 5 observations and the other group has 20 observations, there are 15504 (
choose(20,5)) subsamples to consider. Slightly larger sample sizes would force us to consider millions of subsamples. So instead we compute K random subsamples. I arbitrarily set K to 1,000. Although this is not what Miller (1988) suggested, the K loop shortcut should reduce bias to some extent if it is due to sample size differences. Here are the results:
The K loop approach works very well! Bias can also be handled by the bootstrap. Here is what we get using 200 bootstrap resamples for each simulation iteration:
The bootstrap bias correction works very well too! So in the long-run, the bias in the estimation of differences between medians can be eliminated using the subsampling or the percentile bootstrap approaches. Because of the skewness of the sampling distributions, we also consider the median bias: the bias observed in a typical experiment. In that case, the difference between group means tends to underestimate the population difference:
For the median, the median bias is much lower than the standard (mean) bias, and near zero from n = 20.
Thus, for a typical experiment, the difference between group medians actually suffers less from bias than the difference between group means.
Miller’s (1988) advice was inappropriate because, when comparing two groups, bias in a typical experiment is actually negligible. To be cautious, when sample size is relatively small, it could be useful to report median effects with and without bootstrap bias correction. It would be even better to run simulations to determine the sample sizes required to achieve an acceptable measurement precision, irrespective of the estimator used.
Finally, data & code are available on github.
[GO TO POST 3/4]