This post looks at the coverage of confidence intervals for the difference between two independent correlation coefficients. Previously, we saw how the standard Fisher’s *r-to-z *transform can lead to inflated false positive rates when sampling from non-normal bivariate distributions and the population correlation differs from zero. In this post, we look at a complementary perspective: using simulations, we’re going to determine how often confidence intervals include the population difference. As we saw in our previous post, because we compute say 95% confidence intervals does not mean that over the long run, 95% of such confidence intervals will include the population we’re trying to estimate. In some situations, the coverage is much lower than expected, which means we might fool ourselves more often that we thought (although in practice in most discussions I’ve ever read, authors behave as if their 95% confidence intervals were very narrow 100% confidence intervals — but that’s another story).

We look at confidence interval coverage for the difference between Pearsons’ correlations using Zou’s method (2007) and a modified percentile bootstrap method (Wilcox, 2009). We do the same for the comparison of Spearmans’ correlations using the standard percentile bootstrap. We used simulations with 4,000 iterations. Sampling is from bivariate *g & h* distributions (see illustrations here).

We consider 4 cases:

- g = h = 0, difference = 0.1, vary rho
- g = 1, h = 0, difference = 0.1, vary rho
- rho = 0.3, difference = 0.2, vary g, h = 0
- rho = 0.3, difference = 0.2, vary g, h = 0.2

# g = h = 0, difference = 0.1, vary rho

That’s the standard normal bivariate distribution. Group 1 has values of rho1 = 0 to 0.8, in steps of 0.1. Group 2 has values of rho2 = rho1 + 0.1.

For normal bivariate distributions, coverage is at the nominal level for all methods, sample sizes and population correlations. (Here I only considered sample sizes of 50+ because otherwise power is far too low, so there is no point.)

The width of the CIs (upper bound minus lower bound) decreases with rho and with sample size. That’s expected from the sampling distributions of correlation coefficients.

When CIs do not include the population value, are they located to the left or the right of the population? In the figure below, negative values indicate a preponderance of left shifts, positive values a preponderance of right shifts. A value of 1 = 100% right shifts, -1 = 100% left shifts. For Pearson, CIs not including the population value tend to be located evenly to the left and right of the population. For Spearman, there is a preponderance of left shifted CIs for rho1 = 0.8. This left shift implies a tendency to over-estimate the difference (the difference group 1 minus group 2 is negative).

# g = 1, h = 0, vary rho

What happens when we sample from a skewed distribution?

The coverage is lower than the expected 95% for Zou’s method and the discrepancy worsens with increasing rho1 and with increasing sample size. The percentile bootstrap does a much better job. Spearman’s combined with the percentile bootstrap is spot on.

For CIs that did not include the population value, the pattern of shifts varies as a function of rho. For Pearson, CIs are more likely to be located to the right of the population (under-estimation of the population value or wrong sign) for rho = 0, whereas for rho = 0.8, CIs are more likely to be located to the left. Spearman + bootstrap produces much more balanced results.

To investigate the asymmetry, we look at CIs for `g`

=1, a sample size of n = 200 and the extremes of the distributions, rho1 = 0 and rho2 = 0.8. The figure below shows the preponderance of right shifted CIs for the two Pearson methods. The vertical line marks the population difference of -0.1.

For rho1 = 0.8, the pattern changes to a preponderance of left shifts for all methods, which means that the CIs tended to over-estimate the population difference. CIs for differences between Spearman’s correlations were quite smaller than Pearson’s ones though, thus showing less bias and less uncertainty.

# rho=0.3, diff=0.2, vary g, h = 0

For another perspective on the three methods, we now look at a case with:

- group 1: rho1 = 0.3
- group 2: rho2 = 0.5
- we vary
`g`

from 0 to 1.

For Pearson + Zou, coverage progressively decreases with increasing `g`

, and to a much more limited extent with increasing sample size. Pearson + bootstrap is much more resilient to changes in `g`

. And Spearman + bootstrap just doesn’t care about asymmetry!

The better coverage of Pearson + bootstrap seems to be achieved by producing wider CIs.

Matters only get’s worse for Pearson + Zou when outliers are likely (see notebook on GitHub).

# Conclusion

Based on this new comparison of the 3 methods, I’d argue again that Spearman + bootstrap should be preferred over the two Pearson methods. But if the goal is to assess linear relationships, then Pearson + bootstrap is preferable to Zou’s method. I’ll report on other methods in another post.

# References

## Comparison of correlation coefficients

Zou, Guang Yong. **Toward Using Confidence Intervals to Compare Correlations.** Psychological Methods 12, no. 4 (2007): 399–413. https://doi.org/10.1037/1082-989X.12.4.399.

Wilcox, Rand R. **Comparing Pearson Correlations: Dealing with Heteroscedasticity and Nonnormality.** Communications in Statistics – Simulation and Computation 38, no. 10 (1 November 2009): 2220–34. https://doi.org/10.1080/03610910903289151.

Baguley, Thom. **Comparing correlations: independent and dependent (overlapping or non-overlapping)**
https://seriousstats.wordpress.com/2012/02/05/comparing-correlations/

Diedenhofen, Birk, and Jochen Musch. **Cocor: A Comprehensive Solution for the Statistical Comparison of Correlations.** PLoS ONE 10, no. 4 (2 April 2015). https://doi.org/10.1371/journal.pone.0121945.

*g & h* distributions

Hoaglin, David C. **Summarizing Shape Numerically: The g-and-h Distributions.** In Exploring Data Tables, Trends, and Shapes, 461–513. John Wiley & Sons, Ltd, 1985. https://doi.org/10.1002/9781118150702.ch11.

Yan, Yuan, and Marc G. Genton. **The Tukey G-and-h Distribution.** Significance 16, no. 3 (2019): 12–13. https://doi.org/10.1111/j.1740-9713.2019.01273.x.