# how to fix erroneous error bars for percent correct data

Have you ever seen accurate bar graphs portrayed for percent correct data? For other bounded quantities, such as average scores from an ordinal scale (for instance a 1-9 Likert scale)? It is entirely possible that you have never seen accurate bar graphs of these quantities, because most of these graphs rely on the wrong tools: typically, the mean +/- SD or SEM is shown, or a classic confidence interval of the mean. Why are these techniques wrong? First, they use the mean, which is a non-robust estimator of central tendency; second, they use the variance, a non-robust estimator of dispersion; third, they assume symmetry; fourth, the results are not bounded, such that they can span impossible values, for instance percent correct beyond 100%. This is simply impossible: participants cannot be more than 100% correct. Yet, I regularly see articles with error bars beyond 100% correct, and authors, reviewers and editors seem to be ok with that.

How do we fix the problem? They are four simple answers, and one more elaborate:

1. Do not use bar graphs, use scatterplots instead. There is absolutely no reason why you should have to report means + error bars and hide your data.

2. Use a percentile bootstrap confidence interval – it will not produce boundaries with impossible values and will accommodate asymmetric distributions. If there is skewness or outliers, the mean will produce misleading results – use a robust estimator of central tendency instead, for instance the median or a trimmed mean (Wilcox & Keselman, 2003).

3. Use a binomial proportion confidence interval such as the Jeffreys interval. A quick google search indicates it is available in several R packages.

4. Compute d’ instead of percent correct: you will get a measure of sensitivity independent of bias, and on a continuous scale amenable to regular confidence interval calculations.

5. Use a generalised mixed model, for instance a logit mixed model (Jaeger, 2008).

## References

Jaeger, T.F. (2008) Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. J Mem Lang, 59, 434-446.

Wilcox, R.R. & Keselman, H.J. (2003) Modern Robust Data Analysis Methods: Measures of Central Tendency. Psychological Methods, 8, 254-274.