In this post we’re going to compare two robust dependent correlation coefficients using a frequentist approach. The approach boils down to computing a confidence interval for the difference between correlations. There are several solutions to this problem, and we’re going to focus on what is probably the simplest one, using a percentile bootstrap, as described in Wilcox 2016 & implemented in his R functions `twoDcorR()`

and `twoDNOV()`

. These two functions correspond to two cases:

- Case 1: overlapping correlations
- Case 2: non-overlapping correlations

## Case 1: overlapping correlations

Case 1 corresponds to the common scenario in which we look for correlations, across participants, between one behavioural measurement and activity in several brain areas. For instance, we could look at correlations between percent correct in one task and brain activity in two regions of interest (e.g. parietal and occipital). In this scenario, often papers report a significant brain-behaviour correlation in one brain area, and a non-significant correlation in another brain area. Stopping the analyses at that stage leads to a common interaction fallacy: because correlation 1 is statistically significant and correlation 2 is not does not mean that the two correlations differ (Nieuwenhuis et al. 2011). The interaction fallacy is also covered in a post by Jan Vanhove. Thom Baguley also provides R code to compare correlations, as well as a cautionary note about using correlations at all.

To compare the two correlation coefficients, we proceed like this:

- sample participants with replacement
- compute the two correlation coefficients based on the bootstrap samples
- save the difference between correlations
- execute the previous steps at least 500 times
- use the distribution of bootstrap differences to derive a confidence interval: a 95% confidence interval is defined as the 2.5th and 97.5th quantiles of the bootstrap distribution.

A Matlab script implementing the procedure is on github. To run the code you will need the Robust Correlation Toolbox. First we generate data and illustrate them.

% Then we bootstrap the data Nb = 500; bootcorr1 = zeros(Nb,1); bootcorr2 = zeros(Nb,1); for B = 1:Nb bootsample = randi(Np,1,Np); bootcorr1(B) = Spearman(a(bootsample),b(bootsample),0); bootcorr2(B) = Spearman(a(bootsample),c(bootsample),0); end % Cyril Pernet pointed out on Twitter that the loop is unnecessary. % We can compute all bootstrap samples in one go: % bootsamples = randi(Np,Np,Nb); % bc1 = Spearman(a(bootsamples),b(bootsamples),0); % bc2 = Spearman(a(bootsamples),c(bootsamples),0); % The bootstrap loop does make the bootstrap procedure more intuitive % for new users, especially if they are also learning R or Matlab!

In the example above we used Spearman’s correlation, which is robust to univariate outliers (Pernet, Wilcox & Rousselet, 2012). To apply the technique to Pearson’s correlation, the boundaries of the confidence interval need to be adjusted, as described in Wilcox (2009). However, Pearson’s correlation is not robust so it should be used cautiously (Rousselet & Pernet 2012). Also, as described in Wilcox (2009), Fisher’s z test to compare correlation coefficients is inappropriate.

Confidence intervals are obtained like this:

alpha = 0.05; % probability coverage - 0.05 for 95% CI hi = floor((1-alpha/2)*Nb+.5); lo = floor((alpha/2)*Nb+.5); % for each correlation boot1sort = sort(bootcorr1); boot2sort = sort(bootcorr2); boot1ci = [boot1sort(lo) boot1sort(hi)]; boot2ci = [boot2sort(lo) boot2sort(hi)]; % for the difference between correlations bootdiff = bootcorr1 - bootcorr2; bootdiffsort = sort(bootdiff); diffci = [bootdiffsort(lo) bootdiffsort(hi)];

We get:

```
corr(a,b) = 0.52 [0.34 0.66]
corr(a,c) = 0.79 [0.68 0.86]
difference = -0.27 [-0.44 -0.14]
```

The bootstrap distribution of the differences between correlation coefficients is illustrated below.

The bootstrap distribution does not overlap with zero, our null hypothesis. In that case the p value is exactly zero, which is calculated like this:

pvalue = mean(bootdiffsort < 0); pvalue = 2*min(pvalue,1-pvalue);

The original difference between coefficients is marked by a thick vertical black line. The 95% percentile bootstrap confidence interval is illustrated by the two thin vertical black lines.

## Case 2: non-overlapping correlations

Case 2 corresponds to a before-after scenario. For instance the same participants are tested before and after an intervention, such as a training procedure. On each occasion, we compute a correlation, say between brain activity and behaviour, and we want to know if that correlation changes following the intervention.

This case 2 is addressed using a straightforward modification of case 1. Here are example data:

The bootstrap is done like this:

Nb = 500; bootcorr1 = zeros(Nb,1); bootcorr2 = zeros(Nb,1); for B = 1:Nb bootsample = randi(Np,1,Np); bootcorr1(B) = Spearman(a1(bootsample),b1(bootsample),0); bootcorr2(B) = Spearman(a2(bootsample),b2(bootsample),0); end alpha = 0.05; % probability coverage - 0.05 for 95% CI hi = floor((1-alpha/2)*Nb+.5); lo = floor((alpha/2)*Nb+.5); % for each correlation boot1sort = sort(bootcorr1); boot2sort = sort(bootcorr2); boot1ci = [boot1sort(lo) boot1sort(hi)]; boot2ci = [boot2sort(lo) boot2sort(hi)]; % for the difference between correlations bootdiff = bootcorr1 - bootcorr2; bootdiffsort = sort(bootdiff); diffci = [bootdiffsort(lo) bootdiffsort(hi)];

We get:

```
corr(a1,b1) = 0.52 [0.34 0.66]
corr(a2,b2) = 0.56 [0.39 0.68]
difference = -0.04 [-0.24 0.17]
```

The difference is very close to zero and its confidence interval includes zero. So the training procedure is associated with a very weak change in correlation.

Instead of a confidence interval, we could also report a highest density interval, which will be very close to the confidence interval if the bootstrap distribution is symmetric – the Matlab script on github shows how to compute a HDI. We could also simply report the difference and its bootstrap distribution. This provides a good summary of the uncertainty we have about the difference, without committing to a binary description of the results as significant or not.

## Conclusion

The strategies described here have been validated for Spearman’s correlation and the Winzorised correlation (Wilcox, 2016). The skipped correlation led to too conservative confidence intervals, meaning that in simulations, the 95% confidence intervals contained the true value more than 95% of the times. This illustrates an important idea: the behaviour of a confidence interval is always estimated in the long run, using simulations, and it results from the conjunction of an estimator and a technique to form the confidence interval. Finally, a very similar bootstrap approach can be used to compare regression coefficients (Wilcox 2012), for instance to compare the slopes of robust linear regressions in an overlapping case (Bieniek et al. 2013).

## References

Bieniek, M.M., Frei, L.S. & Rousselet, G.A. (2013) Early ERPs to faces: aging, luminance, and individual differences. Frontiers in psychology, 4, 268.

Nieuwenhuis, S., Forstmann, B.U. & Wagenmakers, E.J. (2011) Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci, 14, 1105-1107.

Pernet, C.R., Wilcox, R. & Rousselet, G.A. (2012) Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front Psychol, 3, 606.

Wilcox, R.R. (2009) Comparing Pearson Correlations: Dealing with Heteroscedasticity and Nonnormality. Communications in Statistics-Simulation and Computation, 38, 2220-2234.

Wilcox, R.R. (2012) Introduction to robust estimation and hypothesis testing. Academic Press, San Diego, CA.

Wilcox, R.R. (2016) Comparing dependent robust correlations. Brit J Math Stat Psy, 69, 215-224.

Balázs Knakker (@BKnakker)Thank you for this very useful and informative post 🙂 The ’16 Wilcox paper considers skipped Pearson – are there any results on skipped Spearman? Or is that somehow not a good case to consider?

LikeLike

garstatsPost authorI’m not aware of a systematic comparisons between skipped Spearman and skipped Pearson when the goal is to compare correlation coefficients. If you expect non-linear relationships, then skipped Spearman might be a good choice. The question is whether it is better at capturing differences in non-linear association. The only way to find out would be to run a simulation, following the recipes in Wilcox’s recent papers on the topic. You might want to drop him an email before you embark on that, in case he’s already working on it, or has unpublished results on the question.

LikeLike