In this post I review the now retracted paper:
Delorme A, Pierce A, Michel L and Radin D (2016) Prediction of Mortality Based on Facial Characteristics. Front. Hum. Neurosci. 10:173. doi: 10.3389/fnhum.2016.00173
In typical Frontiers’ style, the reason for the retraction is obscure.
In December 2016, I made negative comments about the paper on Twitter. Arnaud Delorme (first author, and whom I’ve known for over 20 years), got in touch, asking for clarifications about my points. I said I would write something eventually, so there it goes.
The story is simple: some individuals claim to be able to determine if a person is alive or dead based on a photograph. The authors got hold of 12 such individuals and asked them to perform a dead/alive/don’t know discrimination task. EEG was measured while participants viewed 394 photos of individuals alive or dead (50/50).
Here are some of the methodological problems.
Participants were from California. Some photographs were of US politicians outside California. Participants did not recognise any individuals from the photographs, but unconscious familiarity could still influence behaviour and EEG – who knows?
More importantly, if participants make general claims about their abilities, why not use photographs of individuals from another country altogether? Even better, another culture?
The average group performance of the participants was 53.6%. So as a group, they really can’t do the task. (If you want to argue they can, I challenge you to seek treatment from a surgeon with a 53.6% success record.) Yet, a t-test is reported with p=0.005. Let’s not pay too much attention to the inappropriateness of t-tests for percent correct data. The crucial point is that the participants did not make a claim about their performance as a group: each one of them claimed to be able to tease apart the dead from the living based on photographs. So participants should be assessed individually. Here are the individual performances:
(52.3, 56.7, 53.3, 56.0, 56.6, 51.8, 61.3, 55.3, 50.0, 51.6, 49.5, 49.4)
5 participants have results flagged as significant. One in particular has a performance of 61.3% correct. So how does it compare to participants without super abilities? Well, astonishingly, there is no control group! (Yep, and that paper was peer-reviewed.)
Given the extra-ordinary claims made by the participants, I would have expected a large sample of control participants, to clearly demonstrate that the “readers” perform well beyond normal. I would also have expected the readers to be tested on multiple occasions, to demonstrate the reliability of the effect.
There are two other problems with the behavioural performance:
- participants’ responses were biased towards the ‘dead’ response, so a sensitivity analysis, such as a d’ or a non-parametric equivalent should have been used.
performance varied a lot across the 3 sets of images that composed the 394 photographs. This suggests that the results are not image independent, which could in part be due to the 3 sets containing different proportions of dead and alive persons.
The ERP analyses were performed at the group level using a 2 x 2 design: alive/dead x correct/incorrect classification. One effect is reported with p<0.05: a larger amplitude for incorrect than correct around 100 ms post-stimulus, only for pictures of dead persons. A proper spatial-temporal cluster correction for multiple comparison was applied. There is no clear interpretation of the effect in the paper, except a quick suggestion that it could be due to low-level image properties or an attention effect. A non-specific attention effect is possible, because sorting ERPs based on behavioural performance can be misleading, as explained here. The effect could also be a false positive – in the absence of replication and comparison to a control group, it’s impossible to tell.
To be frank, I don’t understand why EEG was measured at all. I guess if the self-proclaimed readers could do the task at all, it would be interesting to look at the time-course of the brain activity related to the task. But the literature on face recognition shows very little modulation due to identity, except in priming tasks or using SSVEP protocols – so not likely to show anything with single image presentations. If there was something to exploit, the analysis should be performed at the participant level, perhaps using multivariate logistic regression, with cross-validation, to demonstrate a link between brain activity and image type. Similarly to behaviour, each individual result from the “readers” should be compared to a large set of control results, from participants who cannot perform the behavioural task.
In conclusion, this paper should never have been sent for peer-review. That would have saved everyone involved a lot of time. There is nothing in the paper supporting the authors’ conclusion:
“Our results support claims of individuals who report that some as-yet unknown features of the face predict mortality. The results are also compatible with claims about clairvoyance warrants further investigation.”
If the authors are serious about studying clairvoyance, they should embark on a much more ambitious study. To save time and money, I would suggest to drop EEG from the study, to focus on the creation of a large bank of images from various countries and cultures and repeated measurements of readers and many controls.