In psychology and neuroscience, the typical sample size is too small. I’ve recently seen several neuroscience papers with n = 36 animals. For instance, this article uses n = 3 mice per group in a oneway ANOVA. This is a real problem because small sample size is associated with:
 low statistical power

inflated false discovery rate

inflated effect size estimation

low reproducibility

…
Here is a list of excellent publications covering these points:
Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S. & Munafo, M.R. (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews. Neuroscience, 14, 365376.
Colquhoun, D. (2014) An investigation of the false discovery rate and the misinterpretation of pvalues. R Soc Open Sci, 1, 140216.
Forstmeier, W., Wagenmakers, E.J. & Parker, T.H. (2016) Detecting and avoiding likely falsepositive findings – a practical guide. Biol Rev Camb Philos Soc.
Small sample size also prevents us from properly estimating and modelling the populations we sample from. As a consequence, small n stops us from answering a fundamental, yet often ignored empirical question: how do distributions differ?
This important aspect is illustrated in the figure below. Columns show distributions that differ in four different ways. The rows illustrate samples of different sizes. The scatterplots were jittered using ggforce::geom_sina
in R. The vertical black bars indicate the mean of each sample. In row 1, examples 1, 3 and 4 have exactly the same mean. In example 2 the means of the two distributions differ by 2 arbitrary units. The remaining rows illustrate random subsamples of data from row 1. Above each plot, the t value, mean difference and its confidence interval are reported. Even with 100 observations we might struggle to approximate the shape of the parent population. Without additional information, it can be difficult to determine if an observation is an outlier, particularly for skewed distributions. In column 4, samples with n = 20 and n = 5 are very misleading.
Small sample size could be less of a problem in a Bayesian framework, in which information from prior experiments can be incorporated in the analyses. In the blind and significance obsessed frequentist world, small n is a recipe for disaster.
Small sample sizes are also problematic in Bayesian statistics, since little info is contained in small samples. Except of course when 1 observation/participant is a “population” in itself, with many observations per participant.
LikeLike
This is cool! Thanks.
LikeLike
Pingback: Replication, low power and sample sizes: an update – David Schmidt