Previously, we saw that with small sample sizes, correlation estimation is very uncertain, which implies that small n correlations cannot be trusted: the observed value in any experiment could be very far from the population value, and the sign could be wrong too. In addition to the uncertainty associated with small sample sizes, the selective report of results based on p values < 0.05 (or some other threshold), can lead to massively inflated correlation estimates in the literature (Yarkoni, 2009 ☜ if you haven’t done so, you really should read this excellent paper).

Let’s illustrate the problem (code is on GitHub). First, we consider a population rho = 0. Here is the sampling distribution as a function of sample size, as we saw in an earlier post.

Figure 1: Sampling distribution for rho=0.

Now, here is the sampling distribution conditional on p < 0.05. The estimates are massively inflated and the problem gets worse with smaller sample sizes, because the smaller the sample size, the larger the correlations must be by chance for them to be significant.

Figure 2: Sampling distribution for rho=0, given p<0.05

So no, don’t get too excited when you see a statistically significant correlation in a paper…

Let’s do the same exercise when the population correlation is relatively large. With rho = 0.4, the sampling distribution looks like this:

Figure 3: Sampling distribution for rho=0.4.

If we report only those correlations associated with p < 0.05, the distribution looks like this:

Figure 4: Sampling distribution for rho=0.4, given p<0.05

Again, with small sample sizes, the estimates are inflated, albeit in the correct direction. There is nevertheless a small number of large negative correlations (see small purple bump around -0.6 -0.8). Indeed, in 0.77% of simulations, even though the population value was 0.4, a large and p < 0.05 negative correlation was obtained.

### Like this:

Like Loading...

Pingback: Power estimation for correlation analyses | basic statistics

Pingback: Case Study: Why bad [sport] science has real consequences