Previously, we saw that with small sample sizes, correlation estimation is very uncertain, which implies that small n correlations cannot be trusted: the observed value in any experiment could be very far from the population value, and the sign could be wrong too. In addition to the uncertainty associated with small sample sizes, the selective report of results based on p values < 0.05 (or some other threshold), can lead to massively inflated correlation estimates in the literature (Yarkoni, 2009 ☜ if you haven’t done so, you really should read this excellent paper).
Now, here is the sampling distribution conditional on p < 0.05. The estimates are massively inflated and the problem gets worse with smaller sample sizes, because the smaller the sample size, the larger the correlations must be by chance for them to be significant.
So no, don’t get too excited when you see a statistically significant correlation in a paper…
Let’s do the same exercise when the population correlation is relatively large. With rho = 0.4, the sampling distribution looks like this:
If we report only those correlations associated with p < 0.05, the distribution looks like this:
Again, with small sample sizes, the estimates are inflated, albeit in the correct direction. There is nevertheless a small number of large negative correlations (see small purple bump around -0.6 -0.8). Indeed, in 0.77% of simulations, even though the population value was 0.4, a large and p < 0.05 negative correlation was obtained.