# how to fix erroneous error bars for percent correct data

Have you ever seen accurate bar graphs portrayed for percent correct data? For other bounded quantities, such as average scores from an ordinal scale (for instance a 1-9 Likert scale)? It is entirely possible that you have never seen accurate bar graphs of these quantities, because most of these graphs rely on the wrong tools: typically, the mean +/- SD or SEM is shown, or a classic confidence interval of the mean. Why are these techniques wrong? First, they use the mean, which is a non-robust estimator of central tendency; second, they use the variance, a non-robust estimator of dispersion; third, they assume symmetry; fourth, the results are not bounded, such that they can span impossible values, for instance percent correct beyond 100%. This is simply impossible: participants cannot be more than 100% correct. Yet, I regularly see articles with error bars beyond 100% correct, and authors, reviewers and editors seem to be ok with that.

How do we fix the problem? They are four simple answers, and one more elaborate:

1. Do not use bar graphs, use scatterplots instead. There is absolutely no reason why you should have to report means + error bars and hide your data.

2. Use a percentile bootstrap confidence interval – it will not produce boundaries with impossible values and will accommodate asymmetric distributions. If there is skewness or outliers, the mean will produce misleading results – use a robust estimator of central tendency instead, for instance the median or a trimmed mean (Wilcox & Keselman, 2003).

3. Use a binomial proportion confidence interval such as the Jeffreys interval. A quick google search indicates it is available in several R packages.

4. Compute d’ instead of percent correct: you will get a measure of sensitivity independent of bias, and on a continuous scale amenable to regular confidence interval calculations.

5. Use a generalised mixed model, for instance a logit mixed model (Jaeger, 2008).

## References

Jaeger, T.F. (2008) Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. J Mem Lang, 59, 434-446.

Wilcox, R.R. & Keselman, H.J. (2003) Modern Robust Data Analysis Methods: Measures of Central Tendency. Psychological Methods, 8, 254-274.

Advertisements

## 6 thoughts on “how to fix erroneous error bars for percent correct data”

1. Christopher Taylor

A Wilson or Jeffrey’s Interval are reasonable choices and probably a simpler fix than bootstrapping.

Don’t get me wrong, I completely agree, but insisting on these simpler fix would probably lead to less resistance in the review process.

Like

1. garstats Post author

Thanks for the suggestion Chris. There is a good selection of alternatives here, including Jeffreys interval.
Wilcox 2012 also has a few options for binomial fits, but I’ve never experimented with them, so I don’t feel confident making a recommendation.

Like

2. Doby Rahnev (@DobyRahnev)

At least in the case of percent correct, an even better alternative is to report d’ instead. Unlike percent correct, d’ is a linear measure of capacity where d’=3 is actually 3 times better than d’=1. In any case, all of your points still stand.

Like

1. Christopher Taylor

What a strange comment about d’ – I am not at all clear on what you mean. In what sense do you mean d’ is linear? For a yes/no task d’ is the z-transform of hits minus the z-transform of false alarms. That is a non-linear transform and relies heavily on assumptions about the observer (e.g., equal variances) and the task. Lastly, at high levels of performance d’ has an even more troublesome issue – at 100% it is infinite. I am an advocate for SDT but I don’t know how it is the solution here…

Like

1. garstats Post author

d’ is not without problems and its strong underlying assumptions. But across participants, d’ is continuous and unbounded, so a confidence interval method that assumes continuous variables should be fine.

Like