Quantiles are robust and useful descriptive statistics. They belong to the family of L-estimators, which is to say that they are based on the linear combination of order statistics. They are several ways to compute quantiles. For instance, in R, the function `quantile`

has 9 options. In Matlab, the `quantile`

& `prctile`

functions offer only 1 option. Here I’d like to introduce briefly yet another option: the **Harrell-Davis quantile estimator** (Harrell & Davis, 1982). It is the weighted average of all the order statistics (Figure 2). And, in combination with the percentile bootstrap, it is a useful tool to derive confidence intervals of quantiles (Wilcox 2012), as we will see quickly in this post. It is also a useful tool to derive confidence intervals of the difference between quantiles of two groups, as we will see in another post. As discussed previously in the percentile bootstrap post, to make accurate confidence intervals, we need to combine an estimator with a particular confidence interval building procedure, and the right combo is not obvious depending on the data at hand.

Before we motor on, a quick google search suggests that there is recent work to try to improve the Harrell-Davis estimator, so this not to say that this estimator is the best in all situations. But according to Rand Wilcox it works well in many situations, and we do use it a lot in the lab…

Let’s look at data from a paper on visual processing speed estimation (Bieniek et al. 2015). We consider ERP onsets from 120 participants aged 18 to 81.

The sorted ages are:

18 18 19 19 19 19 20 20 21 21 21 21 21 21 21 22 22 22 22 22 22 22 22 23 23 23 24 24 24 25 26 28 28 29 29 30 30 31 31 32 32 32 33 34 34 35 35 36 37 38 40 40 41 41 42 42 43 43 44 45 45 45 45 48 49 49 50 51 54 54 55 56 58 59 59 60 60 61 62 62 62 63 63 63 64 64 64 64 65 65 66 66 66 66 66 66 67 67 67 67 68 68 68 68 68 69 70 70 70 71 72 72 72 75 76 77 78 79 81 81

**Figure 1. Age distribution.**

The Matlab code to reproduce all the figures in this post is available on github. There is also a list of R functions from Rand Wilcox’s toolbox.

How do we compute Harrell-Davis quantiles of the age distribution? Figure 2 shows the Harrell-Davis weights for the deciles of the age distribution.

**Figure 2. Decile weights.**

The deciles are obtained by multiplying the sorted ages by the weights in Figure 2, which gives us:

21.1, 23.3, 29.7, 37.0, 45.3, 56.1, 63.3, 66.6, 70.4

For comparison, the age deciles from Matlab’s `prctile`

function are:

21, 23, 30, 36, 45, 57, 64, 66, 70

Now, we can update the scatterplot in Figure 1 with the deciles:

**Figure 3. Scatterplot + age deciles.** The thick vertical black line marks the 50th quantiles.

We can also compute a confidence interval for a Harrell-Davis quantile. There are two ways to do that:

- using a percentile bootstrap of the quantile (pbci approach);
- using a percentile bootstrap estimate of the standard error of the quantile, which is then plugged into a confidence interval formula (pbse approach).

Using the code available with this post, we can try the two approaches on the median:

- pbci approach gives 45.31 [35.89, 54.73]
- pbse approach gives 45.31 [38.49, 54.40]

The two methods return similar upper bounds, but quite different lower bounds. Because they are both based on random resampling with replacement, running the same analysis several times will each time also give slightly different results. Actually, this is one important criterion to select a good bootstrap confidence interval technique: despite random sampling, using the same technique many times should provide overall similar results. Another important criterion is the probability coverage: if we build a 95% confidence interval, we want that confidence interval to contain the population value we’re trying to estimate 95% of the time. That’s right, the probability attached to a confidence interval is a long run coverage: assuming a population with a certain median, if we perform the same experiment over and over, every time drawing a sample of n observations and computing an (1-alpha)% confidence interval using the same technique, (1-alpha)% of these confidence intervals will contain the population median. So, if everything is fine (n is large enough, the number of bootstrap samples is large enough, the combination of bootstrap technique and estimator is appropriate), alpha% of the time (usually 5%), a confidence interval WILL NOT include the population parameter of interest. This implies that given the 1,000s of neuroscience & psychology experiments performed every year, 100s of paper report the wrong confidence intervals – but this possibility is never considered in the articles’ conclusions…

In many situations, the long run probability coverage can be actually much lower or much higher than (1-alpha). So can we check that we’re building accurate confidence intervals, at least in the long run? For that, we’ve got to run simulations. Here is an example. First, we create a fake population, for instance with a skewed distribution, which could reflect our belief of the nature of the population we’re studying:

**Figure 4. Population of 1,000,000 values with a 10 degrees of freedom chi2 distribution.**

Second, we compute benchmark values, e.g. median, mean…

Third, we run simulations in which we perform fake experiments with a given sample size, and then compute confidence intervals of certain quantities. Finally, we check how often the different confidence intervals actually contain the population parameters (probability coverage):

- pbse(hd) = 0.9530
- pbci(hd) = 0.9473
- pbci(median) = 0.9452
- pbci(mean) = 0.9394

They’re all very close to 95%. However, the confidence intervals of hd created using the pbse approach tended to be larger than those created using the pbci approach. The confidence intervals for the mean missed the population mean 1% of the time compared to the expected 95% – that’s because they tended to be shorter than the other 3. The bootstrap estimates of the sampling distribution of hd, the median and the mean, as well as the width of the confidence intervals can be explored using the code on github.

Of course, no one is ever going to run 10,000 times the same experiment! And these results assume a certain population, a certain number of observations per experiment, and a certain number of bootstrap samples. We would need a more systematic exploration of the different combinations of options to be sure the present results are not special cases.

To be clear: **there is absolutely no guarantee that any particular confidence interval contains the population parameter you’re trying to estimate.** So be humble, and don’t make such a big deal about your confidence intervals, especially if you have small sample sizes.

Personally, more and more I use confidence intervals to try to describe the variability in the sample at hand. For that purpose, and to avoid potential inferential problems associated with confidence intervals, I think it is more satisfactory to use highest density intervals HDI. I will post R & Matlab functions to compute the HDI of the bootstrap quantiles on github at some stage. By reporting HDI, there are no associated p values and we minimise the temptation to cross proton streams (i.e. dichotomise a continuous variable to make a binary decision – MacCallum et al. 2002).

Finally, we consider something a bit more interesting than the age of our participants: the distribution of ERP onsets.

Here are the onsets in milliseconds:

**Figure 5. Onsets.**

And the deciles with their confidence intervals, which provide a very nice summary of the distribution:

**Figure 6. Onset deciles with confidence intervals.**

If you’re interested, I’ve also attempted a Bayesian estimation of the onset data using R and JAGS. However, I don’t know yet how to perform quantile estimation – please get in touch if you can help.

## Conclusion

Now you’ve got the tools to describe a distribution in detail. There is no particular reason why we should be obsessed with the mean, especially when robust and more informative statistics are available. Next, I will show you how to compare all the deciles of two distributions using a mighty tool: the **shift function**. This will, of course, rely on the Harrell-Davis estimator and the bootstrap.

## References

Bieniek, M.M., Bennett, P.J., Sekuler, A.B. & Rousselet, G.A. (2015) A robust and representative lower bound on object processing speed in humans. The European journal of neuroscience.

Harrell, F.E. & Davis, C.E. (1982) A new distribution-free quantile estimator. Biometrika, 69, 635-640.

MacCallum RC, Zhang S, Preacher KJ, Rucker DD. 2002. On the practice of dichotomization of quantitative variables. Psychological Methods 7: 19-40

Wilcox, R.R. (2012) Introduction to robust estimation and hypothesis testing. Academic Press.