The previous post considered alpha when sampling from normal and non-normal distributions. Here the simulations are extended to look at power in the one-sample case. Statistical power is the long term probability of returning a significant test when there is an effect, or probability of true positives.
Power depends both on sample size and effect size. This is illustrated in the figure below, which reports simulations with 5,000 iterations, using t-test on means applied to samples from a normal distribution.
Now, let’s look at power in less idealistic conditions, for instance when sampling from a lognormal distribution, which is strongly positively skewed. This power simulation used 10,000 iterations and an effect size of 0.5, i.e. we sample from distributions that are shifted by 0.5 from zero.
Under normality (dashed lines), as expected the mean performs better than the 20% trimmed mean and the median: we need smaller sample sizes to reach 80% power when using the mean. However, when sampling from a lognormal distribution the performance of the three estimators is completely reversed: now the mean performs worse; the 20% trimmed mean performs much better; the median performs even better. So when sampling from a skewed distribution, the choice of statistical test can have large effects on power. In particular, a t-test on the mean can have very low power, whereas a t-test on a trimmed mean, or a test on the median can provide much larger power.
As we did in the previous post, let’s look at power in different situations in which we vary the asymmetry and the tails of the distributions. The effect size is 0.5.
A t-test on means performs very well under normality (g=0), as we saw in the previous figure. However, as asymmetry increases, power is strongly affected. With large asymmetry (g>1) the t-test is biased: starting from very low sample sizes, power goes down with increasing sample sizes, before going up again in some situations.
A t-test using a 20% trimmed mean is dramatically less affected by asymmetry than the mean.
The median also performs much better than the mean but it behaves differently from the mean and the 20% trimmed mean: power increases with increasing asymmetry!
What happens when we manipulate the tails instead? Remember that samples from distributions with heavy tails tend to contain outliers, which affect disproportionally the mean and the variance compared to robust estimators. Not surprisingly, t-tests on means are strongly affected by heavy tails.
The 20% trimmed mean boosts power significantly, although it is still affected by heavy tails.
The median performs the best, showing very limited power drop with increasing tail thickness.
The simulations presented here are of course limited, but they serve as a reminder that power should be estimated using realistic distributions, for instance if the goal is to estimate well-known skewed distributions such as reaction times. The choice of estimators is also critical, and it would be wise to consider robust estimators whenever appropriate.
Wilcox, R.R. (2012) Introduction to robust estimation and hypothesis testing. Academic Press, San Diego, CA.
Wilcox, Rand; Rousselet, Guillaume (2017): A guide to robust statistical methods in neuroscience. figshare. https://doi.org/10.6084/m9.figshare.5114275.v1