UPDATE (2018-05-17): the method suggested here is completely bogus. I’ve edited the post to explain why. To make inferences about differences scores, use the difference asymmetry function or make inferences about the quantiles of the differences (Rousselet, Pernet & Wilcox, 2017).

The shift function is a graphical and inferential method that allows users to quantify how two distributions differ. It is a frequentist tool that also comes in several Bayesian flavours, and can be applied to independent and dependent groups. The version for dependent groups uses differences between the quantiles of each group. However, for paired observations, it would be also useful to assess the quantiles of the pairwise differences. This is what ~~the~~ this new shift function ~~does~~ was supposed to do.

Let’s consider the fictive reaction time data below, generated using exGaussian distributions (n = 100 participants).

The kernel density estimates suggest interesting differences: condition 1 is overall more spread out than condition 2; as a result, the two distributions differ in both the left (fast participants) and right (slow participants) tails. However, this plot does not reveal the pairwise nature of the observations. This is better illustrated using a scatterplot.

The scatterplot reveals more clearly the relationship between conditions:

– fast participants, shown in dark blue on the left, tended to be a bit faster in condition 1 than in condition 2;

– slower participants, shown in yellow on the right, tended to be slower in condition 1 than in condition 2;

– this effect seems to be more prominent for participants with responses larger than about 500 ms, with a trend for larger differences with increasing average response latencies.

A shift function can help assess and quantify this pattern. In the shift function below, the x axis shows the deciles in condition 1. The y axis shows the differences between deciles from the two conditions. The difference is reported in the coloured label. The vertical lines show the 95% percentile bootstrap confidence intervals. As we travel from left to right along the x axis, we consider progressively slower participants in condition 1. These slower responses in condition 1 are associated with progressively faster responses in condition 2 (the difference condition 1 – condition 2 increases).

So here the inferences are made on differences between quantiles of the marginal distributions: for each distribution, we compute quantiles, and then subtract the quantiles.

What if we want to make inferences on the pairwise differences instead? This can be done by computing the quantiles of the differences, and plotting them as a function of the quantiles in one group. A small change in the code gives us a new shift function for dependent groups.

The two versions look very similar, which is re-assuring, **but does not demonstrate anything (except confirmation bias and wishful thinking on my part)**. ~~But there might be situations where the two versions differ. Also, the second version makes explicit inferences about the pairwise differences, not about the differences between marginal distributions: so despite the similarities, they afford different conclusions.~~

Let’s look at the critical example that I should have considered before getting all excited and blogging about the “new method”. A simple negative control demonstrates what is wrong with the approach. Here are two dependent distributions, with a clear shift between the marginals.

The pairwise relationships are better illustrated using a scatterplot, which shows a seemingly uniform shift between conditions.

Plotting the pairwise differences as a function of observations in condition 1 confirms the pattern: the differences don’t seem to vary much with the results in condition 1. In other words, differences don’t seem to be particularly larger or smaller for low results in condition 1 relative to high results.

The shift function on marginals does a great job at capturing the differences, showing a pattern characteristic of stochastic dominance (Speckman, Rouder, Morey & Pratte, 2008): one condition (condition 2) dominates the other at every decile. The differences also appear to be a bit larger for higher than lower deciles in condition 1.

The modified shift function, shown next, makes no sense. That’s because the deciles of condition 1 and the deciles of the difference scores necessarily increase from 1 to 9, so plotting one as a function of the other **ALWAYS** gives a positive slope. The same positive slope I thought was capturing a pattern of regression to the mean! So I fooled myself because I was so eager to find a technique to quantify regression to the mean, and I only used examples that confirmed my expectations (confirmation bias)! This totally blinded me to what in retrospect is a very silly mistake.

Finally, let’s go back to the pattern observed in the previous shift function, where it seemed that the difference scores were increasing from low to high quantiles of condition 1. The presence of this pattern can better be tested using a technique that makes inferences about pairwise differences. One such technique is the difference asymmetry function. The idea from Wilcox (2012, Wilcox & Erceg-Hurn, 2012) goes like this: if two distributions are identical, then the difference scores should be symmetrically distributed around zero. To test for asymmetry, we can estimate sums of lower and higher quantiles; for instance, the sum of quantile 0.05 and quantile 0.95, 0.10 + 0.90, 0.15 + 0.85… For symmetric distributions with a median of zero, these sums should be close to zero, leading to a flat function centred at zero. If for instance the negative differences tend to be larger than the positive differences, the function will start with negative sums and will increase progressively towards zero (see example in Rousselet, Pernet & Wilcox). In our example, the difference asymmetry function is negative and flat, which is characteristic of a uniform shift, without much evidence for an asymmetry. Which is good because that’s how the fake data were generated! So using graphical representations such as scatterplots, in conjunction with the shift function and the difference asymmetry function, can provide a very detailed and informative account of how two distributions differ.

# Conclusion

I got very excited by the new approach because after spending several days thinking about test-retest reliability assessment from a graphical perspective, I thought I had found the perfect tool, as explained in the next post. So the ingredients of my mistake are clear: statistical sloppiness and confirmation bias.

The code for the figures in this post and for the ~~new~~ bogus shift function is available on github. I~~’ll~~ will not update the `rogme`

package, which implements the otherwise perfectly valid shift functions and difference asymmetry functions.

# References

Speckman, P.L., Rouder, J.N., Morey, R.D. & Pratte, M.S. (2008) Delta plots and coherent distribution ordering. *Am Stat*, **62**, 262-266.

Rousselet, G.A., Pernet, C.R. & Wilcox, R.R. (2017) Beyond differences in means: robust graphical methods to compare two groups in neuroscience. *The European journal of neuroscience*, **46**, 1738-1748. [preprint] [reproducibility package]

Wilcox, R.R. (2012) Comparing Two Independent Groups Via a Quantile Generalization of the Wilcoxon-Mann-Whitney Test. *Journal of Modern Applied Statistical Methods*, **11**, 296-302.

Wilcox, R.R. & Erceg-Hurn, D.M. (2012) Comparing two dependent groups via quantiles. *J Appl Stat*, **39**, 2655-2664.