How to compare dependent correlations

In this post we’re going to compare two robust dependent correlation coefficients using a frequentist approach. The approach boils down to computing a confidence interval for the difference between correlations. There are several solutions to this problem, and we’re going to focus on what is probably the simplest one, using a percentile bootstrap, as described in Wilcox 2016 & implemented in his R functions twoDcorR() and twoDNOV(). These two functions correspond to two cases:

• Case 1: overlapping correlations
• Case 2: non-overlapping correlations

Case 1: overlapping correlations

Case 1 corresponds to the common scenario in which we look for correlations, across participants, between one behavioural measurement and activity in several brain areas. For instance, we could look at correlations between percent correct in one task and brain activity in two regions of interest (e.g. parietal and occipital). In this scenario, often papers report a significant brain-behaviour correlation in one brain area, and a non-significant correlation in another brain area. Stopping the analyses at that stage leads to a common interaction fallacy: because correlation 1 is statistically significant and correlation 2 is not does not mean that the two correlations differ (Nieuwenhuis et al. 2011). The interaction fallacy is also covered in a post by Jan Vanhove. Thom Baguley also provides R code to compare correlations, as well as a cautionary note about using correlations at all.

To compare the two correlation coefficients, we proceed like this:

• sample participants with replacement
• use the participant indices to create bootstrap samples for each group (concretely, for 3 groups, we sample triads of observations, preserving the dependency among observations)
• compute the two correlation coefficients based on the bootstrap samples
• save the difference between correlations
• execute the previous steps at least 500 times
• use the distribution of bootstrap differences to derive a confidence interval: a 95% confidence interval is defined as the 2.5th and 97.5th quantiles of the bootstrap distribution.

A Matlab script implementing the procedure is on github. To run the code you will need the Robust Correlation Toolbox. First we generate data and illustrate them. Figure 1

% Then we bootstrap the data
Nb = 500;
bootcorr1 = zeros(Nb,1);
bootcorr2 = zeros(Nb,1);

for B = 1:Nb

bootsample = randi(Np,1,Np);
bootcorr1(B) = Spearman(a(bootsample),b(bootsample),0);
bootcorr2(B) = Spearman(a(bootsample),c(bootsample),0);

end

% Cyril Pernet pointed out on Twitter that the loop is unnecessary.
% We can compute all bootstrap samples in one go:
% bootsamples = randi(Np,Np,Nb);
% bc1 = Spearman(a(bootsamples),b(bootsamples),0);
% bc2 = Spearman(a(bootsamples),c(bootsamples),0);
% The bootstrap loop does make the bootstrap procedure more intuitive
% for new users, especially if they are also learning R or Matlab!

In the example above we used Spearman’s correlation, which is robust to univariate outliers (Pernet, Wilcox & Rousselet, 2012). To apply the technique to Pearson’s correlation, the boundaries of the confidence interval need to be adjusted, as described in Wilcox (2009). However, Pearson’s correlation is not robust so it should be used cautiously (Rousselet & Pernet 2012). Also, as described in Wilcox (2009), Fisher’s z test to compare correlation coefficients is inappropriate.

Confidence intervals are obtained like this:

alpha = 0.05; % probability coverage - 0.05 for 95% CI

hi = floor((1-alpha/2)*Nb+.5);
lo = floor((alpha/2)*Nb+.5);

% for each correlation
boot1sort = sort(bootcorr1);
boot2sort = sort(bootcorr2);
boot1ci = [boot1sort(lo) boot1sort(hi)];
boot2ci = [boot2sort(lo) boot2sort(hi)];

% for the difference between correlations
bootdiff = bootcorr1 - bootcorr2;
bootdiffsort = sort(bootdiff);
diffci = [bootdiffsort(lo) bootdiffsort(hi)];

We get:

corr(a,b) = 0.52 [0.34 0.66]
corr(a,c) = 0.79 [0.68 0.86]
difference = -0.27 [-0.44 -0.14]

The bootstrap distribution of the differences between correlation coefficients is illustrated below. Figure 2

The bootstrap distribution does not overlap with zero, our null hypothesis. In that case the p value is exactly zero, which is calculated like this:

pvalue = mean(bootdiffsort < 0);
pvalue = 2*min(pvalue,1-pvalue);

The original difference between coefficients is marked by a thick vertical black line. The 95% percentile bootstrap confidence interval is illustrated by the two thin vertical black lines.

Case 2: non-overlapping correlations

Case 2 corresponds to a before-after scenario. For instance the same participants are tested before and after an intervention, such as a training procedure. On each occasion, we compute a correlation, say between brain activity and behaviour, and we want to know if that correlation changes following the intervention.

This case 2 is addressed using a straightforward modification of case 1. Here are example data: Figure 3

The bootstrap is done like this:

Nb = 500;
bootcorr1 = zeros(Nb,1);
bootcorr2 = zeros(Nb,1);

for B = 1:Nb

bootsample = randi(Np,1,Np);
bootcorr1(B) = Spearman(a1(bootsample),b1(bootsample),0);
bootcorr2(B) = Spearman(a2(bootsample),b2(bootsample),0);

end

alpha = 0.05; % probability coverage - 0.05 for 95% CI
hi = floor((1-alpha/2)*Nb+.5);
lo = floor((alpha/2)*Nb+.5);

% for each correlation
boot1sort = sort(bootcorr1);
boot2sort = sort(bootcorr2);
boot1ci = [boot1sort(lo) boot1sort(hi)];
boot2ci = [boot2sort(lo) boot2sort(hi)];

% for the difference between correlations
bootdiff = bootcorr1 - bootcorr2;
bootdiffsort = sort(bootdiff);
diffci = [bootdiffsort(lo) bootdiffsort(hi)];

We get:

corr(a1,b1) = 0.52 [0.34 0.66]
corr(a2,b2) = 0.56 [0.39 0.68]
difference = -0.04 [-0.24 0.17]

The difference is very close to zero and its confidence interval includes zero. So the training procedure is associated with a very weak change in correlation.

Instead of a confidence interval, we could also report a highest density interval, which will be very close to the confidence interval if the bootstrap distribution is symmetric – the Matlab script on github shows how to compute a HDI. We could also simply report the difference and its bootstrap distribution. This provides a good summary of the uncertainty we have about the difference, without committing to a binary description of the results as significant or not. Figure 4

Conclusion

The strategies described here have been validated for Spearman’s correlation and the Winzorised correlation (Wilcox, 2016). The skipped correlation led to too conservative confidence intervals, meaning that in simulations, the 95% confidence intervals contained the true value more than 95% of the times. This illustrates an important idea: the behaviour of a confidence interval is always estimated in the long run, using simulations, and it results from the conjunction of an estimator and a technique to form the confidence interval. Finally, a very similar bootstrap approach can be used to compare regression coefficients (Wilcox 2012), for instance to compare the slopes of robust linear regressions in an overlapping case (Bieniek et al. 2013).

References

Bieniek, M.M., Frei, L.S. & Rousselet, G.A. (2013) Early ERPs to faces: aging, luminance, and individual differences. Frontiers in psychology, 4, 268.

Nieuwenhuis, S., Forstmann, B.U. & Wagenmakers, E.J. (2011) Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci, 14, 1105-1107.

Pernet, C.R., Wilcox, R. & Rousselet, G.A. (2012) Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front Psychol, 3, 606.

Wilcox, R.R. (2009) Comparing Pearson Correlations: Dealing with Heteroscedasticity and Nonnormality. Communications in Statistics-Simulation and Computation, 38, 2220-2234.

Wilcox, R.R. (2012) Introduction to robust estimation and hypothesis testing. Academic Press, San Diego, CA.

Wilcox, R.R. (2016) Comparing dependent robust correlations. Brit J Math Stat Psy, 69, 215-224.

11 thoughts on “How to compare dependent correlations”

1. Balázs Knakker (@BKnakker)

Thank you for this very useful and informative post 🙂 The ’16 Wilcox paper considers skipped Pearson – are there any results on skipped Spearman? Or is that somehow not a good case to consider?

Like

1. garstats Post author

I’m not aware of a systematic comparisons between skipped Spearman and skipped Pearson when the goal is to compare correlation coefficients. If you expect non-linear relationships, then skipped Spearman might be a good choice. The question is whether it is better at capturing differences in non-linear association. The only way to find out would be to run a simulation, following the recipes in Wilcox’s recent papers on the topic. You might want to drop him an email before you embark on that, in case he’s already working on it, or has unpublished results on the question.

Like

2. Alistair J. Cullum

Useful post, thanks. I might suggest that in the first line of your bullet-pointed procedure you specify that each of the two bootstrap samples is drawn from its respective original sample, not from a pooled sample. That was my guess, but I had to check the code to be certain.

Like

1. garstats Post author

Thanks for your feedback Alistair. I’ll edit the text, that’s an important point indeed.

Like

3. Francois Orliac

Hi Guillaume, thank you for this very useful post. Is there a similar matlab script to compare two indepedent skipped spearman’s correlations computed with the robust correlation toolbox, or the “Non-overlapping case” described here would do the trick?

Like

1. garstats Post author

Hi Francois,
here is the code to compare two independent robust correlations (including skipped correlations):

Assume:
g1 is a matrix n1 rows x 2 columns
g2 is a matrix n2 rows x 2 columns

Nb = 500;
bootcorr1 = zeros(Nb,1);
bootcorr2 = zeros(Nb,1);

for B = 1:Nb

bootsamples1 = randi(n1, 1, Nb); # bootstrap samples for group 1
bootsamples2 = randi(n2, 1, Nb); # bootstrap samples for group 2
bootcorr1(B) = Spearman(g1(bootsamples1, 1), g1(bootsamples1, 2),0);
bootcorr2(B) = Spearman(g2(bootsamples2, 1), g2(bootsamples2, 2),0);

end

Because the groups are independent, we sample independently pairs of observations from each of them.
The rest of the code is the same.

Like

1. Francois Orliac

Thank you Guillaume, it helps a lot!

Liked by 1 person

4. David Emanuel Vetter

I am not very familiar with this method, but have the following question concerning the interpretation as a p-value in the first case: Isn’t the histogram shown a sample/approximation of the alternative Hypothesis (H1: values are unequal), and not the null Hypothesis (H0: values are equal)? Does that not mean that the computed “p-value” is the probability of a Type-II-error?

Like

1. garstats Post author

Hi David, the histogram shows a bootstrap estimation of the sampling distribution of the differences between correlation coefficients. So the p value has a different meaning from the parametric one, which uses a sampling distribution conditional on the null being true. But the bootstrap p value is not a probability of a type-2 error: there is no long-run probability associated with a p value, because a new experiment can suggest a different p value. In many situations, both types of p values will be highly correlated and both reflect the amount of surprise about the results, given a hypothesis (here a point null hypothesis).

Like