Is Lucy’s post-it note a confession? Whether you will see it as a confession or a cry of innocent anguish depends on whether *you* have a heart and a brain. If you read it carefully, you will see that Lucy does not say that she killed those babies. She says that *they said* she killed those babies. Yes, she does say she is evil. She thinks she is clearly a bad nurse who apparently couldn’t save those babies, despite her (possibly too energetic, and certainly not well supervised) attempts. More seriously, she had had an affair with an older married man, a doctor, who later dumped her and betrayed her. She spoke out about doctors’ mistakes and about the catastrophic hygienic circumstances in which she and her colleagues had to work. For two years, doctors had tried to have her taken off that ward, because she pissed them off. Her colleague nurses loved her for her forthrightness and lovely character. She is so sorry for the suffering she caused her parents and step-brothers. She is considering suicide. She has PTSD.
This spreadsheet was shown on TV both yesterday (Friday August 18, the day of the verdicts) and at the start of the trial of Lucy Letby. Apparently, Cheshire Constabulary find this absolutely damning evidence against Lucy. And indeed, many journalists seem to agree.
The 25 events are almost all of the events at which LL was present during the periods investigated. They are suspicious because she was under suspicion when the police started their investigations. Not surprisingly, most nurses are not present at many of these events. And of course, many nurses probably work far fewer hours than LL. Many are often on administrative duties.
The doctors on the ward are of course missing. Doctors were never investigated as suspects but from the start of police investigations apparently always believed to speak gospel truth. During cross-examination, during the trial, some of them have changed various parts of their stories. Of course, unlike Lucy, they do not lie, since they could never (under oath in court, or earlier, when being interviewed as witnesses by police) be saying untruths in order to deceive.
Back to the spreadsheet. When drawing conclusions from any data it is important to know how it was gathered. It is important to know what data is missing, but would be needed draw even the most preliminary and tentative inferences.
There was an NHS investigation into the raised rates of deaths and collapses at Countess of Chester Hospital (CoCH) in summer 2015 and summer 2016. It was published in 2017 by the Royal College of Paediatrics and Child Health (RCPCH). The investigation blamed the consultants for the appalling low standard of care, and the terrible situation regarding hygiene. The RCPCH investigators actually wrote that nurse Lucy Letby could not be associated with the events, but that passage was redacted out of the published report for privacy reasons. We know that already, consultants had presented their fears to hospital management. One of them (successful TV doctor and FaceBook influencer dr Ravi Jayaram) was on TV yesterday proudly telling the world that he had been vindicated. Management was inclined not to believe them, and did not act on them, but they certainly came to the ears of the RCPCH. On publication of the report, four consultants had had enough, and went to the police with their suspicions that LL was a murderer.
Thanks to FOI requests and statistical analysis by independent scientists, we now know that the rate of events (deaths and collapses) is just as much raised when Lucy is not on the ward as it is when she is on the ward. A lot of medical information (as well as the state of the drains at CoCH) points to a seasonal virus epidemic.
The elevated rate went back to normal after the hospital was down-graded (no longer accepting high risk patients), and when the drains were rebuilt, and when the senior consultant retired, all of which happened soon after the police investigation started. Incidentally, the rate of still-births and miscarriages show exactly the same pattern.
Lucy must certainly have been a witch in order to kill babies in the womb and even when she is far from the hospital.
Those familiar with miscarriages of justice involving serial killer nurses will be familiar with this police and prosecution tactic. Is it evil or is it just stupid? (cf. Hanlon’s razor). I think it is quite simply “learnt”. Police and prosecution learn what convinces jurors over the years, and that is why the same “mistakes” are made again and again. They work!
Note: [20 August 2023] This post is incomplete. It needs a prequel: the history of medical investigations into two “unexplained clusters” of deaths at the neonatal ward of the Countess of Chester Hospital. It needs many sequels: statistical evidence; how the cases were selected (the Texas sharpshooter paradox) and the origin of suspicions that a particular nurse might be a serial killer; the post-it note; the alleged insulin poisonings; the trouble with sewage backflow and the evidence of the plumber; the euthanasias. For the medical material, the site to visit is the magnificent https://rexvlucyletby2023.com/.
Lucy Letby, a young nurse, has been tried at Manchester Crown Court for 7 murders and 15 murder attempts on 17 newborn children in the neonatal ward at Countess of Chester Hospital, Chester, UK, in 2015 and 2016.
She was found:– Guilty of 7 counts of murder (against 7 babies) – Guilty of 7 counts of attempted murder (against 6 babies) – Not guilty on 2 counts of attempted murder (against 2 of the 6 babies she *was* found guilty of attempting to murder). No decision was reached on 6 counts of attempted murder against 6 different babies. However, 2 of those 6 she was also found guilty of a different count of attempted murder. [Thanks to the commenter who corrected my numbers.]
The prosecution dropped one further murder charge just before the trial started, on the instruction of the judge. Several groups of alleged murders and murder attempts concern the same child, or twin or triplet siblings. All but one child was born pre-term. Several of them, extremely pre-term.
I’m not saying that I know that Lucy Letby is innocent. As a scientist, I am saying that this case is a major miscarriage of justice. Lucy did not have a fair trial. The similarities with the famous case of Lucia de Berk in the Netherlands are deeply disturbing.
The image below summarizes findings concerning the medical evidence. This was not my research. The graphic was given to me by a person who wishes to remain anonymous, in order to disseminate the research now fully documented on https://rexvlucyletby2023.com/, whose author and owner wishes to remain anonymous. Note that the defence has not called any expert witnesses at all (except for one person: the plumber). Possibly, they had not enough funds for this. Crowd-sourcing might be a smart way of getting the necessary work done for free, to be used at a subsequent appeal. That’s a dangerous tactic, and it seems to me that the defence has already taken a foolish step: they admitted that two babies received unauthorised doses of insulin, and their client was obliged to believe that too.
This blog post started in May 2023 as a first attempt by myself to blog about a case which I have been following for a long time. The information I report here was uncovered by others and is discussed on various internet fora. Links and sources are given below, some lead to yet more excellent sources. Everything here was communicated to the defence, but they declined to use it in court. Maybe they felt their hands were bound by pre-trial agreement between the trial parties as to what evidence would be brought to the attention of the jury, which witnesses, etc.
An extraordinary feature of UK criminal prosecution law is that if exculpatory evidence is in the possession of the defence, but not used in court, then it should not be used at a subsequent appeal, whether by the same defence team or a new one. This might explain why the defence team would not even inform their client of their knowledge of the existence of evidence which exonerated her. Even though, it is also against the law that they did not, as far as we know, disclose evidence which they had which was in her favour. The UK law on criminal court procedure is case law. New judges can always decide to depart from past judges’ rulings.
A very important issue is that the rules of use of expert evidence is that all expert evidence must be introduced before the trial starts. It is strictly forbidden to introduce new expert evidence once the trial is underway.
UK criminal trials are tightly scripted theatre. The jury is of course incommunicado, very close to its verdict, and I do not aim to influence the jury or their verdict. I aim to stimulate discussion of the case in advance of a likely appeal against a likely guilty verdict. I wish to support that small part of the UK population who are deeply concerned that this trial is going to end in an unjustified guilty verdict. Probably it will, but that will not be the end. So much information has come out in the 9 months of the trial so far, that a serious fight on behalf of Lucy Letby is now possible. Public opinion crystallised long ago against Lucy. It can be made fluid again, and maybe it can even be reversed, and this is what must happen if she is to get a fair re-trial.
As a concerned scientist who perceives a miscarriage of justice in the making, I attempted to communicate information not only to the defence but also to the prosecution, to the judge (via the clerk of the court), and to the Director of Public Prosecutions. That was a Kafkaesque experience which I will write about on another occasion. Personally, I tend to think that Lucy is innocent. That was however not my reason for attempting to contact the authorities. As a scientist, it was manifestly clear to me that she was not getting a fair trial. Science was being abused. I tried to communicate with the appropriate authorities. I failed to get any response. Therefore I had to “go public”.
Here is a short list of key medical/scientific issues, originally copied from an early version of the incredible and amazing website https://rexvlucyletby2023.com/, with occasional slight rephrasing and some small, hopefully correct, additions by myself. That site presents full scientific documentation and argumentation for all of the claims made there.
Air embolism cannot be determined by imaging, and can only be determined soon after death, and requires the extraction of air from the circulatory system, and analysis of the composition of the air using gas chromatography.
The coroner found a cause of death in 5 out of 7 of the alleged murder cases. Two of them appeared to be, in part, related to aggressive CPR, two appeared to be due to undiagnosed hypoxic-ischemic encephalopathy and myocarditis, one of the infants received no autopsy, and the other infant was determined to have died due to prematurity. It is highly unusual for the cause of death to be altered years after the fact and using methodology that is not supported by the coroner’s office.
The two claims of insulin poisoning are not supported by the testing conducted, and the infants (who are still alive and well) did not have dangerously low or dangerously high blood glucose levels for any period of time. There are many physiological reasons that could explain their low blood glucose during the whole period. In one of the two cases, assumptions are being made on the basis of one test taken at a single time point, clearly inconsistent with the other medical readings, and contravening the manufacturer’s own instructions for use (see image below). The report detailing the conclusions from that single test violates the code of practice of the forensic science regulator. Moreover, it appears that some numerical error has been made in the necessary calculation, resulting in an outcome which is physiologically impossible (or the person responsible did not know about the so-called “hook effect”). The mismatch between C-peptide and insulin concentration does not prove that the excess insulin found must have been synthetic insulin. There are many other biological explanations for a mismatch. No testing was done to determine the origin of the insulin. Similarly, there are many innocent explanations for the detection of some insulin in a feeding bag.
The air embolism hypothesis is confusing because it fails to explain why some children apparently perished and others did not, and it has not been supported by the minimal necessary measurements.
In at least one case, Lucy is blamed with causing white matter brain injury. This claim is utterly dishonest. The infant who experienced this brain injury was born at 23 weeks gestation, and white matter brain injury is associated with such early births. Further, there is sufficient evidence that demonstrates that enterovirus and parechovirus infection has been linked to white matter brain injury in neonates, resulting in cerebral palsy.
At the time of the collapses and deaths of the infants, enterovirus and parechovirus had been reported in other hospitals. There is a history of outbreaks of these viruses in neonatal wards in hospitals around the world. They especially harm preterm infants who do not yet have a functioning immune system. It is reported that many parents of the infants were concerned that their ward had a virus (as was Lucy) and that Dr Gibbs denied this was so. To date we have seen no evidence to show they did any viral testing, and if they did what the results were.
Then a fact pertaining to my own scientific competence.
Both prosecution and defence were warned long ago about the statistical issues in such cases. Both have responded that they are not going to use any statistics. They are also not using the services of any statistician. Seems the RSS report https://rss.org.uk/news-publication/news-publications/2022/section-group-reports/rss-publishes-report-on-dealing-with-uncertainty-i/ has had the opposite effect to that intended. Amusingly, the same thing happened in the case of Lucia de Berk. At the appeal the prosecution stopped using statistics. She was convicted solely on the grounds of “irrefutable medical scientific evidence”. (Here, I’m quoting from the words both spoken by the judges and written down on the first page of their > 100 page report of the reasons and reasoning which had led to their unshakable conviction that Lucia de Berk was guilty. The longest judge’s summing up in Dutch legal history). I was one of the five coauthors of the RSS report. We were a “task force”, formally commissioned by the “Statistics and the Law” section of the society. I consider it the most important scientific work of my career. It took us two years to put together. We started the work in 2020; we had seen the Lucy Letby trial on the horizon since 2017 when police investigations started and the suspect being investigated was already common knowledge.
The UK does not have anything like that because a jury of ordinary folk are the ones who (legally) determine guilt or innocence. This is a clever device which makes fighting a conviction very difficult; no one can know what arguments the jury had in their mind, no one knows what, if anything, was the key fact that convinced them of guilt. Ordinary people are convinced by what seems to be a smoking gun, they then see all the other evidence through a filter. This is called “confirmation bias”. In the Lucy Letby case, the smoking gun was probably the post-it note, and the insulin then seems to clinch the matter. The prosecution cross-examination convinces those who already believe Lucy is guilty that she moreover is constantly lying. More on all this in later posts, I hope.
Back to the insulin. Here are the instructions on the insulin testing kit used for the trial, taken from this website http://pathlabs.rlbuht.nhs.uk/ccfram.htm, the actual file is http://pathlabs.rlbuht.nhs.uk/insulin.pdf. Notice the warning printed in red. Yes, it was printed in red, that was not something I changed later. (All this is not my discovery; the person who uncovered these facts wishes to remain anonymous).
The toxicological evidence used in the trial violates the code of practice of the UK’s Forensic Science Regulator (see link below). It should have been deemed inadmissible. Instead, the defence has not disputed it, and thereby obliged their own client Lucy to agree that there must have been a killer on the ward. The jury are instructed to believe that two babies were given insulin without authorization, endangering their lives. (The two babies in question are still very much alive, to this day. Probably now at primary school.)
The defence stated to me that they cannot inform Lucy of the alternative analysis of the insulin question. It appears to me that this violates their own code of practice. Do they feel bound by the weird rules of UK’s criminal prosecution practice? Their client, Lucy Letby, is herself essentially merely a piece of evidence, seized by the police from what they believe is a scene of crime. No one may tamper with it during the duration of her own trial, which is lasting 10 months! I think this constitutes an appalling violation of basic human rights. The UK laws on contempt of court are meant to guarantee a fair trial. But in the case of a 10-month trial on 22 charges of murder and attempted murder, they are guaranteeing an unfair trial.
Lucy’s solicitor refused to pass on a friendly personal letter of support to Lucy or to her parents because she had not instructed him to do so. Should one laugh or cry about that excuse? I have the impression that he is not very bright and that he may have been convinced she is guilty. If so, I hope he is changing his mind. In the UK, the solicitor does all the legwork and communication between the client and the defence team. The barrister does the cross-examinations and the court theatrics, but probably never builds up a personal relationship with his client. Lucy has been all this time prison, in pre-trial detention, far from Manchester or Hereford. This might explain the extraordinarily weak defence which has been put up so far. But it might be deliberate.
One must take into account the fact that funding for legal support is meagre. The prosecution has been working on the case for 6 or so years, with unlimited resources. The defence has had a relatively very short time, with very limited resources. Probably the solicitor and the barrister already put in many more hours than they are paid for. There are no funds for expensive scientific witnesses. It is very possible that the defence team well understands that they cannot put up a serious defence during the 9 to 10 months of the trial, but that precisely this time period, with a huge number of revelations being made outside the trial, material for a serious defence during an appeal has been “crowd-sourced”. It seems to me that this mass of high-quality independent scientific work provides plenty of grounds for an appeal, in the case that the jury hands down a guilty verdict.
At a pre-publication meeting of stake-holders held to gain feedback on our report, a senior West Midlands police inspector told me “we are not using statistics because they only make people confused”. Lucy’s sollicitor and barrister knew well in advance of our report, were even given names of excellent UK experts whom they could consult, but did not bother to contact one of them. No statistics in our courts please, we are British! Yet the UK has the best applied statisticians and epidemiologists in the world.
Article in “Science” about my work on serial killer nurses
https://www.bbc.co.uk/sounds/play/m001k7vt?partner=uk.co.bbc&origin=share-mobile “The UK’s forensic science used to be considered the gold standard, but no longer. The risk of miscarriages of justice is growing. And now a new Westminster Commission is trying to find out what went wrong. Joshua talks to its co-chair, leading forensic scientist Dr Angela Gallop CBE, and to criminal defence barrister Katy Thorne KC.”
Criminal Procedure Rules and Criminal Practice Directions
New expert evidence cannot be admitted once a trial is in progress
“The courts have indicated that they are prepared to refuse leave to the Defence to call expert evidence where they have failed to comply with CrimPR; for example by serving reports late in the proceedings, which raise new issues (Writtle v DPP  EWHC 236). See also: R v Ensor  1 Cr. App. R.18 and Reed, Reed & Garmson EWCA Crim. 2698″. This quote comes from https://www.cps.gov.uk/legal-guidance/expert-evidence. Note, a judge is always allowed to break with precedence. The rule is not actually a permanent rule, it is merely a description of current practice. Current practice evolves when and if a new judge sees fit to break with precedence. Obviously, he would have to come up with good legal reasons why he believes he has to do that. It’s his prerogative, his free choice. That’s the essence of case law, aka common law.
Deze Apeldoornse wetenschapper redt onschuldige zusters uit de gevangenis: ‘De mens wil helaas niet in toeval geloven’
Wetenschapper Richard Gill uit Apeldoorn zorgde er mede voor dat Lucia de Berk werd vrijgesproken. Datzelfde kreeg hij voor elkaar bij een vergelijkbare zaak in Italië en nu gaat hij voor de hattrick in Engeland. Wat drijft hem?
Anne Boer 28-05-22, 08:00
Pure wetenschappelijke nieuwsgierigheid, dat is wat hem drijft, zegt de internationaal vermaarde wiskundige Richard Gill uit Apeldoorn. Als expert op het terrein van statistieken werkte Gill (70) voor het Openbaar Ministerie en het Internationaal Strafhof. Bijna zes jaar is hij gepensioneerd en staat hij te boek als emeritus professor in de statistiek aan de Universiteit Leiden.
Met zijn kennis over het gebruik van statistieken heeft hij vanuit zijn werkkamer de onschuld kunnen aantonen van twee verpleegkundigen die waren veroordeeld voor seriemoorden: de Nederlandse Lucia de Berk en de Italiaanse Daniela Poggiali. Nu zet hij zich in voor de vrijlating van verpleegkundige Ben Geen uit Engeland.
Alle drie zouden tijdens hun werk patiënten hebben gedood. Lucia de Berk werd zelfs veroordeeld voor zeven moorden. De bewijslast was vooral gebaseerd op statistieken. Als Lucia werkte, zouden er meer patiënten overlijden dan tijdens de diensten van haar collega’s. Het bleek klinkklare onzin, zoals Gill het fijntjes verwoordt. ,,Kwestie van roddel en achterklap, zoeken naar een zondebok om de reputatie van het ziekenhuis te redden en aannames, terwijl er helemaal geen moord is gepleegd.’’
De mens wil helaas niet in toeval geloven, we willen een oorzaak hebben. Daarom geloven we ook in duivels en goden
Statistisch bewijs speelt een grote rol in onderzoek, ook naar seriemoordenaars in de medische wereld. ,,Maar dan moet je de cijfers wel goed interpreteren’’, vindt Gill. ,,Als er ogenschijnlijk veel mensen overlijden in een ziekenhuis, moet je eerst goed kijken naar de oorzaak. Zijn er misschien meer patiënten dan anders? Zijn ze zieker dan in andere perioden? Is de methode van registreren aangepast? Zijn er wijzigingen in de staf? Als je meteen kijkt welke verpleegkundige aanwezig was, sla je bovendien de belangrijkste vragen over: is er sprake van moord of is het medisch falen of zelfs natuurlijk overlijden?’’
Dat raakt volgens Gill meteen aan een ander pijnlijk punt. ,,Een ziekenhuis is een plek waar mensen doodgaan, maar vaak is de doodsoorzaak niet duidelijk. Dat kan leiden tot clusters van verdachte overlijdens. Je moet wel weten welke doden je telt, anders zoekt de politie bewijs voor beweringen.’’
Volgens Gill moet je altijd in gedachten houden dat er een goede, onschuldige reden kan zijn voor een gebeurtenis. ,,Kijk vooral hoe vaak iemand werkt. Fulltime verpleegkundigen maken meer doden mee dan parttimers. Als iemand fulltime werkt en ook nog gepassioneerd bezig is met haar of zijn vak, is de kans nog groter dat die persoon aanwezig is als iemand overlijdt, dan iemand die een paar dagen per week werkt of strikt de uren werkt die in het rooster staan.”
Nooit mag je volgens hem een rare samenloop van omstandigheden uitsluiten. ,,Die gebeuren, ook zonder moord. Beroemd is het voorbeeld van een Amerikaans stel dat op één dag in twee verschillende loterijen de hoofdprijs won. Hoe groot is de kans dat zoiets gebeurt? Het gebeurde toch echt. De mens wil helaas niet in toeval geloven, we willen een oorzaak hebben en zoeken een zondebok. Daarom geloven we ook in duivels en goden.’’
Richard Gill is geboren in Engeland. Zijn vader was ook wetenschapper. De liefde brengt hem in 1974 op 23-jarige leeftijd naar Nederland. Hij is zes jaar eerder op vakantie als een blok gevallen voor een dochter van een Nederlandse vriend van zijn vader. Beide vaders werken voor Wavin uit Hardenberg. Na wat omzwervingen belandt Gill begin jaren 80 in Apeldoorn, om er nooit meer weg te gaan. Hij woont in een oud herenhuis, in een zee van weelderig groen. Dit was het ouderlijk huis van zijn vrouw. Om financieel het hoofd boven water te houden, werkt hij extra hard om snel carrière te kunnen maken.
De medische wereld komt al vroeg op zijn pad. Na een studie wiskunde in Cambridge promoveert hij op onderzoek naar de vraag hoelang kankerpatiënten bij bepaalde behandeling overleven. Zijn rekenmethode blijkt een uitkomst en wordt inmiddels ook op andere terreinen toegepast. ,,Het kwam toevallig op mijn bord. Ik had geen onderwerp en mijn promotor haalde dit onderwerp uit zijn la. Het heeft veel impact gehad en de methode wordt nog massaal gebruikt.’’
Zijn vrouw, die historicus is, wijst hem al in een vroeg stadium op de zaak Lucia de Berk, die later veroordeeld zou worden voor zeven moorden in een ziekenhuis. ,,Zij sprak van een heksenjacht en wilde dat ik ernaar keek, zeker toen het ook een heksenproces werd, zoals ze dat noemde. Ze wees me erop dat statistiek als bewijs werd gebruikt en ik er dus wel iets van zou moeten vinden. Ik wilde niet. Er waren al ervaren statistici bij betrokken, ook mensen die ik kende.’’
Toen er in 2006 een boek over deze zaak verscheen, ging Gill overstag. ,,Ik werd door een collega op het boek gewezen. Ik wist werkelijk niet wat ik las, was er echt ondersteboven van. Voor mij was zonneklaar dat het vonnis niet deugde en de rechters de cijfers verkeerd hadden geïnterpreteerd.’’
De rest is geschiedenis. Gill hielp aantonen dat de cijfers de beschuldiging niet konden onderbouwen en Lucia werd na 6,5 jaar onterechte celstraf in 2010 volledig vrijgesproken.
Als hij in 2014 over een gelijksoortige situatie in Italië leest, besluit hij direct weer in actie te komen. Dit keer wordt een verpleegkundige (Daniela Poggiali) verdacht van zestig moorden. Gill belt zijn collega Julia Mortera van de Roma Tre-universiteit en samen bieden ze hun hulp aan. Met succes, ook deze verpleegkundige is na een eerdere veroordeling tot levenslange gevangenisstraf sinds oktober op vrije voeten.
Statistiek is de wetenschap en de techniek van het verzamelen, bewerken, interpreteren en presenteren van gegevens. Statistische methoden worden gebruikt om grote hoeveelheden gegevens – bijvoorbeeld over het koopgedrag van mensen, de huizenmarkt of het aantal doden in de zorg – om te zetten in bruikbare informatie.
,,De statistiek in deze zaak was totaal amateuristisch, het deugde niet. De aanklagers beweerden dat er meer sterfgevallen waren als Daniela werkte. Tot het moment dat ze werd gearresteerd: toen daalde het plotseling. Wij ontdekten dat het sterftecijfer bij alle personeelsleden hoog was. Daniela was vaak al voor het begin van haar ingeroosterde dienst aanwezig en bleef vaak ook nog helpen nadat haar dienst voorbij was. Daardoor was ze vaker aanwezig als een patiënt stierf. Dat het aantal doden daalde nadat Daniela werd gearresteerd, is simpel te verklaren. Het nieuws over de ‘moordzuster’ was breed uitgemeten in de media. Als gevolg daarvan trok het ziekenhuis minder patiënten. Minder patiënten betekent ook minder sterfgevallen.’’
Gill doet nu onderzoek naar de zware beschuldigingen tegen de Engelse verpleegkundige Ben Geen. Dat gebeurt op verzoek van zijn advocaat. Het is vooralsnog een heel lastige kluif, vooral omdat het rechtssysteem in Engeland anders in elkaar zit. Opnieuw is Gill ervan overtuigd dat de verdachte geen moorden heeft gepleegd en dat het recht moet zegevieren.
Uit deze zaken heeft hij belangrijke lessen getrokken die hij wil overbrengen aan iedereen die wereldwijd betrokken is bij de rechtspraak, van advocaten tot rechters en van officieren van justitie tot juryleden. Samen met andere experts schrijft hij een handleiding hoe statistiek in de rechtbank kan worden gebruikt, met name bij strafprocessen tegen zogeheten seriemoordenaars in de gezondheidszorg. Dat gebeurt onder supervisie van het gezaghebbende instituut Royal Statistical Society. Het boek moet later dit jaar verschijnen.
Je kunt aannemen dat een hond vier poten heeft, maar niet dat alles met vier poten een hond is. Als op die manier naar Lucia was gekeken, was ze nooit veroordeeld
De boodschap die hij heeft, is in hoofdlijnen simpel: gebruik statistische gegevens pas als je je ervan hebt verzekerd dat ze kloppen en gebruik ze goed. ,,Benoem alle factoren. Trek niet te snel conclusies. Vraag onafhankelijke experts om hulp. Onderzoek alle mogelijkheden.’’ Volgens Gill is niet alleen expertise van professionals nodig, maar ook dat rechters en advocaten worden geschoold in een goede interpretatie van statistieken.
Hij geeft een simpel voorbeeld. ,,Je kunt aannemen dat een hond vier poten heeft, maar niet dat alles met vier poten een hond is. Je mag aannemen dat iemand uit Peru Spaans spreekt, maar niet iedereen die Spaans spreekt komt uit Peru. Als op die manier naar Lucia was gekeken, was ze nooit veroordeeld.’’
Kenmerkend vindt Gill dat de verdachten die hij hielp, allemaal opvallende mensen zijn. Ze werkten hard, hadden een duidelijke mening, stootten daardoor waarschijnlijk ook leidinggevenden voor het hoofd en eindigden uiteindelijk als zondebok. ,,Het heeft me echt getroffen hoeveel ze gemeen hebben. Ben Geen wilde legerarts worden en was enorm gedreven in zijn werk. Hij zag zijn werk als meer dan een baan en deed veel extra als het kon. Hij botste ook met managers omdat het ziekenhuis voortdurend tegen grenzen aanliep.’’
Als expert op het terrein van statistieken werkte Gill ook voor het Openbaar Ministerie (moordzaak Tamara Wolvers) en het Internationaal Strafhof (moordaanslag president Libanon). Inmiddels is hij al bijna zes jaar gepensioneerd, maar tijd om zich te vervelen, heeft hij niet. Er ligt nog voor jaren werk op zijn bordje. Puzzels die hij graag helpt oplossen.
Daarnaast zijn er veel onderwerpen waar hij graag in zou willen duiken, zoals de geruchtmakende Deventer moordzaak, die hem al jaren mateloos intrigeert. ,,Ik houd het nog steeds voor mogelijk dat de veroordeelde Ernest Louwes onschuldig is. Met name de dna-sporen op de blouse van de vermoorde weduwe vind ik interessant. Dna is ook statistisch bewijs en statistiek vertelt ons hoe je met onzekerheden moet omgaan. Er zijn inmiddels nieuwe moleculairbiologische methoden om veel meer uit een spoor te halen.’’
Gill helpt Kamerlid Pieter Omtzigt met het analyseren van data over uithuisplaatsingen als gevolg van het toeslagenschandaal. ,,We maken een tijdlijn om oorzaak en gevolg in beeld te krijgen. Ik heb dus eigenlijk helemaal geen tijd meer om nog meer verpleegkundigen achter de tralies vandaan te halen’’, zegt hij met een glimlach.
Als er toch weer een zaak van een vermeende moordzuster op zijn pad komt, zal hij waarschijnlijk moeilijk ‘nee’ kunnen zeggen. Hij geniet van het puzzelen en wil voorkomen dat het leven saai wordt. De tekst op de achterkant van zijn trui spreekt wat dat betreft misschien wel boekdelen: ‘Keep calm, en deze opa lost het wel op’. Want ja. Gill, vader van drie kinderen, is opa en zijn vijf kleinkinderen logeren graag bij hem en zijn vrouw in Apeldoorn.
Een van de vele puzzels die hem al jaren bezighoudt en soms zelfs uit zijn slaap haalt, moet hij van zichzelf oplossen: de zaak José Booij, die achttien jaar geleden werd geconfronteerd met de uithuisplaatsing van haar zes weken oude baby Julia-Lynn.
,,Een onvoorstelbaar en afschuwelijk verhaal. Zij is vermalen door het systeem en daar compleet aan onderdoor gegaan. Ik ben het contact met José verloren, maar nog steeds in het bezit van een doos met persoonlijke spullen van haar, zoals kindertekeningen, diploma’s, dagboeken en krantenknipsels over haar strijd voor haar kind tot de hoogste rechtsorganen in Nederland en Europa aan toe. Wellicht leeft Julia-Lynn nu onder een andere naam en wellicht weet ze haar geboortenaam niet eens. Ik wil dat ze weet wie haar moeder is. Dat die nooit heeft opgegeven. Daar heeft ze recht op. Ik hoop haar ooit te vinden en de spullen van haar moeder te kunnen geven. En weet je, ook deze vrouw is een bijzonder mens, anders dan anderen.’
This Apeldoorn scientist saves innocent sisters from prison: ‘ Unfortunately, people do not want to believe in coincidence ‘
Scientist Richard Gill from Apeldoorn helped ensure that Lucia de Berk was acquitted. He achieved the same in a similar case in Italy and now he is going for the hat trick in England. What drives him?
Anne Boer 28-05-22, 08:00
Pure scientific curiosity is what drives him, says the internationally renowned mathematician Richard Gill from Apeldoorn. As an expert in statistics, Gill (70) worked for the Public Prosecution Service and the International Criminal Court. He has been retired for almost six years and is known as emeritus professor of statistics at Leiden University.
With his knowledge of the use of statistics, he was able to prove the innocence of two nurses who were convicted of serial murders from his office: the Dutch Lucia de Berk and the Italian Daniela Poggiali. Now he is campaigning for the release of nurse Ben Geen from England.
All three are said to have killed patients on the job . Lucia de Berk was even convicted of seven murders. The burden of proof was mainly based on statistics. If Lucia worked, more patients would die than during her colleagues ‘ shifts. It turned out to be sheer nonsense, as Gill puts it delicately. “A matter of gossip and backbiting, looking for a scapegoat to save the hospital’s reputation and assumptions when no murder was committed at all. †
Unfortunately, people do not want to believe in coincidence, we want to have a cause. That’s why we also believe in devils and gods
Statistical evidence plays a major role in research, including serial killers in the medical world. ,,But then you have to interpret the figures properly ” , says Gill. “If apparently many people die in a hospital, you first have to look closely at the cause. Are there perhaps more patients than usual ? Are they sicker than in other periods? Has the registration method been adjusted? Are there changes in the staff? If you immediately look at which nurse was present, you also skip the most important questions: is it murder or is it medical failure or even natural death? †
According to Gill, that immediately touches on another sore point. “A hospital is a place where people die, but often the cause of death is not clear. This can lead to clusters of suspicious deaths. You have to know which deaths you count, otherwise the police will look for evidence for claims. †
According to Gill, you should always keep in mind that there can be a good, innocent reason for an event. “Look at how often someone works. Full-time nurses experience more deaths than part-time nurses. If someone works full-time and is passionate about their craft, they’re even more likely to be there when someone dies than someone who works a few days a week or works strictly within the schedule. †
According to him, you should never rule out a strange combination of circumstances. “They happen, even without murder. A famous example is an American couple who won the top prize in two different lotteries in one day . What are the chances of something like this happening? It really happened. Unfortunately, people do not want to believe in coincidence, we want to have a cause and look for a scapegoat. That is why we also believe in devils and gods. †
Richard Gill was born in England. His father was also a scientist. Love brought him to the Netherlands in 1974 at the age of 23. Six years earlier, on holiday, he had fallen head over heels for a daughter of a Dutch friend of his father’s. Both fathers work for Wavin from Hardenberg. After some wanderings, Gill ends up in Apeldoorn in the early 1980s, never to leave. He lives in an old mansion, in a sea of lush greenery. This was his wife’s childhood home. To keep his head above water financially, he works extra hard to make a quick career .
The medical world crosses his path early on. After studying mathematics at Cambridge, he obtained his doctorate for research into the question of how long cancer patients survive with a particular treatment. His calculation method turned out to be a godsend and is now also being applied in other areas. “It just happened to be on my plate. I didn’t have a topic and my promoter took this topic out of his drawer. It has had a lot of impact and the method is still widely used. †
His wife, who is a historian, points him at an early stage to the case of Lucia de Berk, who would later be convicted of seven murders in a hospital. “She spoke of a witch hunt and wanted me to watch it, especially when it became a witch trial, as she called it. She pointed out to me that statistics were used as evidence and so I should have something to say about it. I did not want to. Experienced statisticians were already involved, including people I knew. †
When a book about this case was published in 2006, Gill took the plunge. “I was referred to the book by a colleague. I really didn’t know what I was reading, I was really blown away by it. It was crystal clear to me that the verdict was wrong and that the judges had misinterpreted the figures . †
The rest is history. Gill helped show that the figures failed to substantiate the accusation and Lucia was fully acquitted in 2010 after 6.5 years of wrongful imprisonment.
reads about a similar situation in Italy in 2014 , he immediately decides to jump back into action. This time, a nurse (Daniela Poggiali) is suspected of sixty murders. Gill calls his colleague Julia Mortera from Roma Tre University and together they offer their help. With success, this nurse has also been free since October after a previous sentence to life imprisonment.
Statistics is the science and technology of collecting, processing, interpreting and presenting data. Statistical methods are used to convert large amounts of data – for example about people’s purchasing behaviour, the housing market or the number of deaths in care – into useful information .
,,The statistics in this case were completely amateurish, it was not good. Prosecutors claimed there were more deaths when Daniela worked. Until she was arrested: then it suddenly dropped. We found that the death rate for all staff was high. Daniela was often present before the start of her scheduled shift and often continued to help after her shift was over. As a result, she was more often present when a patient died . It is easy to explain that the number of deaths decreased after Daniela was arrested. The news about the ‘ murder sister ‘ was widely covered in the media. As a result, the hospital attracted fewer patients . Fewer patients also means fewer deaths. †
Gill is now investigating the serious allegations against English nurse Ben Geen. This is done at the request of his lawyer. It is still a very difficult task, especially because the legal system in England is different. Once again, Gill is convinced that the suspect committed no murders and that justice must prevail.
He has learned important lessons from these cases that he wants to pass on to everyone involved in the justice system worldwide, from lawyers to judges and from prosecutors to jurors. Together with other experts, he is writing a manual on how to use statistics in court, especially in criminal proceedings against so-called serial killers in healthcare. This is done under the supervision of the authoritative Royal Statistical Society. The book is due out later this year.
You can assume that a dog has four legs, but not everything with four legs is a dog. If Lucia had been looked at that way, she would never have been judged
The message he has is basically simple: do not use statistical data until you have ensured that they are correct and use them well. “Name all the factors. Don’t jump to conclusions too quickly. Ask independent experts for help. Explore all possibilities. ” According to Gill, not only expertise from professionals is needed, but also that judges and lawyers are trained in a good interpretation of statistics.
He gives a simple example. “You can assume that a dog has four legs, but not that everything with four legs is a dog. You can assume that someone from Peru speaks Spanish, but not everyone who speaks Spanish is from Peru. If Lucia had been looked at that way, she would never have been convicted. †
Typically, Gill finds that the suspects he helped are all striking people. They worked hard, had strong opinions, probably offended executives as a result, and ended up as scapegoats. “It really struck me how much they have in common. Ben Geen wanted to be an army doctor and was very passionate about his work. He saw his work as more than a job and did a lot extra when he could. He also clashed with managers because the hospital was constantly running into limits. †
As an expert in statistics, Gill also worked for the Public Prosecution Service (Tamara Wolvers murder case) and the International Criminal Court (Lebanon Presidential assassination attempt). He has now been retired for almost six years, but he has no time to get bored. There is still work on his plate for years to come. Puzzles he likes to help solve.
In addition, there are many subjects that he would like to delve into, such as the controversial Deventer murder case, which has intrigued him immensely for years. “I still think it possible that the convicted Ernest Louwes is innocent. I find the DNA traces on the murdered widow’s blouse particularly interesting. DNA is also statistical evidence and statistics tells us how to deal with uncertainties. There are now new molecular biological methods to get much more out of a track. †
Gill helps MP Pieter Omtzigt with analyzing data about custodial placements as a result of the benefits scandal. ,,We make a timeline to get a picture of cause and effect. So I don’t really have time to get any more nurses out of prison , ” he says with a smile.
If a case of an alleged murder sister comes his way, he will probably have a hard time saying ‘ no ‘ . He enjoys puzzling and wants to prevent life from getting boring. The text on the back of his sweater might speak volumes in that regard: ‘ Keep calm, and this grandpa will solve it ‘ . Because yes. Gill, a father of three, is a grandfather and his five grandchildren like to stay with him and his wife in Apeldoorn.
He has to solve one of the many puzzles that has occupied him for years and sometimes even wakes him up: the case of Jos é Booij, who was confronted eighteen years ago with the custodial placement of her six-week-old baby Julia-Lynn.
“An unbelievable and horrifying story. She was crushed by the system and completely destroyed by it. I have lost contact with José , but I still have a box with her personal items, such as children’s drawings, diplomas , diaries and newspaper clippings about her fight for her child up to the highest courts in the Netherlands and Europe. . Julia-Lynn may be living under a different name now and may not even know her birth name. I want her to know who her mother is. That he never gave up. She is entitled to that. I hope one day to find her and give her mother’s things. And you know, this woman is also a special person, different from others. †
More than ten years ago I started writing a book on Dutch miscarriages of justice in which I had been involved. I wanted to explore the personality issues in three such cases. In each case, it seemed to me that aspects of the character of the main protagonist led to them being something of a scapegoat of a system under great stress. Some trigger events caused a bad situation to become an utter disaster. Authorities made mistakes and could not admit them, so errors were compounded, and there was no going back, no way to change path any more.
In recent posts, I have told a lot of the story of José Booij. It’s time to start writing about Lucia de Berk and Kevin Sweeney.
Concerning Lucia de Berk there already is an enormous literature. The case started in 2001, seemed to be closed with Lucia in jail for life by 2006 (conviction by the lower court at the first trial in 2003, appeal to higher court failed in 2004, cassation – appeal to the supreme court – failed in 2006) but at that time also a strong movement burst into the public view, calling for a judicial review and a retrial. Lucia was fully exonerated in 2010. The role of statistics in the case is well known though controversial since at the 2004 appeal, she was convicted “on the grounds of incontrovertible medical scientific evidence only”. A “statistical probability calculation” (such as the infamous calculation leading to the spectacular 1 in 342 million) played no part at all in the court’s conclusion, according to her judges.
Yet many things have still not been said in public about the case, except perhaps in literary form. In my future book, I want to say things I have said many times before in ephemeral blog posts, and other removed or hidden web pages.
Concerning Kevin Sweeney, not much has been written at all. He sat out his sentence for the murder of his wife and keeps a low profile.
The title of this blog might refer to the very, very famous trials of Amanda Knox in the case of the murder of Meredith Kercher. However, I am writing about a case that is much less known outside of Italy (neither victim nor alleged murderer was a rich American girl). This is the case of Daniela Poggiali, a nurse suspected by the media and accused by prosecution experts of having killed around 90 patients in a two-year killing spree terminated by her arrest in April 2014. She has just been exonerated after a total of three years in prison with a life sentence as well some months of pre-trial detention. This case revolved around statistics of an increased death rate during the shifts of a colourful nurse. I was a scientific expert for the defence, working with an Italian colleague, Julia Mortera (Univ. Rome Tre), later assisted by her colleague Francesco Dotto.
Piet Groeneboom and I worked together on the statistics of the case of Lucia de Berk, see our paper in Chance [Reference]. In fact, it was remarkable that the statistical community in the Netherlands got so involved in that case. A Fokke and Sukke cartoon entitled “Fokke and Sukke know it intuitively” had the exchange “The probability that almost all professors of statistics are in agreement … is obviously very small indeed”.
Indeed, it wasn’t. That was one of the high points of my career. Another was Lucia’s final acquittal in 2010, at which the judges took the trouble to say out loud, in public, that the nurses had fought heroically for the lives of their patients; lives squandered, they added, by their doctors’ medical errors.
At that point, I felt we had learnt how to fight miscarriages of justice like that, of which I rapidly became involved in several. So far, however, with rather depressing results. Till a couple of months ago. This story will not have much to do with mathematics. It will have to do with simple descriptive statistics, and I will also mention the phrases “p-value” and “Bayes’ rule” a few times. One of the skills of a professional statistician is the abstraction of messy real-world problems involving chance and data. It’s not for everybody. Many mathematical statisticians prefer to prove theorems, just like any other mathematician. In fact, I often do prefer to do that myself, but I like more being able to alternate between the two modes of activity, and I do like sticking my nose into other people’s business, and learning about what goes on in, for instance, law, medicine, or anything else. Each of the two activity modes is a nice therapy for the frustrations which inevitably come with the other.
The Daniela Poggiali case began, for me, soon after the 8th of April, 2014, when it was first reported in international news media. A nurse at the Umberto I hospital in the small town of Lugo, not far from Ravenna, had been arrested and was being investigated for serial murder. She had had photos of herself taken laughing, close to the body of a deceased patient, and these “selfies” were soon plastered over the front pages of tabloid media. Pretty soon, they arrived in The Guardian and The New York Times. The newspapers sometimes suggested she had killed 93 patients, sometimes 31, sometimes it was other large numbers. It was suspected that she had used Potassium Chloride on some of those patients. An ideal murder weapon for a killer nurse since easily available in a hospital, easy to give to a patient who is already hooked up to an IV drip, kills rapidly (cardiac arrest – it is used in America for executions), and after a short time hard to detect. After death, it redistributes itself throughout the body where it becomes indistinguishable from a normal concentration of Potassium.
Many features of the case reminded me strongly of the case of Lucia de Berk in the Netherlands. In fact, it seemed very fishy indeed. I found the name of Daniela’s lawyer in the online Italian newspapers, Google found me an email address, and I sent a message offering support on the statistics of the case. I also got an Italian statistician colleague and good friend, Julia Mortera, interested. Daniela’s lawyer was grateful for our offer of help. The case largely hinged on a statistical analysis of the coincidence between deaths on a hospital ward and Daniela’s shifts there. We were emailed pdfs of scanned pages of a faxed report of around 50 pages containing results of statistical analyses of times of shifts of all the nurses working on the ward, and times of admission and discharge (or death) of all patients, during much of the period 2012 – 2014. There were a further 50 pages (also scanned and faxed) of appendices containing print-outs of the raw data submitted by hospital administrators to police investigators. Two huge messy Excel spreadsheets.
The authors of the report were Prof. Franco Tagliaro (Univ. Verona) and Prof. Rocco Micciolo (Univ. Trento). The two are respectively a pathologist/toxicologist and an epidemiologist. The epidemiologist Micciolo is a professor in a social science department, and member of an interfaculty collaboration for the health sciences. We found out that the senior and more influential author Tagliaro had published many papers on toxicology in the forensic science literature, usually based on empirical studies using data sets provided by forensic institutes. Occasionally, his friend Micciolo turned up in the list of authors and had supplied statistical analyses. Micciolo describes himself as a biostatistician. He has written Italian language textbooks on exploratory data-analysis with the statistical package “R” and is frequently the statistician-coauthor of papers written by scientists from his university in many different fields including medicine and psychology. They both had decent H-indices, their publications were in decent journals, their work was mainstream, useful, “normal science”. They were not amateurs. Or were they?
Daniela Poggiali worked on a very large ward with very many very old patients, many suffering terminal illnesses. Ages ranged from 50 up to 105, mostly around ninety. The ward had about 60 beds and was usually quite fully occupied. Patients tended to stay one to two weeks in the hospital, admitted to the hospital for reasons of acute illness. There was on average one death every day; some days none, some days up to four. Most patients were discharged after several weeks in the hospital to go home or to a nursing home. It was an ordinary “medium care” nursing department (i.e., not an Intensive Care unit).
Some very simple statistics showed that the death rate on days when Poggiali worked was much higher than on days when she did not work. A more refined analysis compared the rate of deaths during the hours she worked with the rate of deaths during the hours she was not at work. Again, her presence “caused” a huge excess, statistically highly significant. A yet more refined analysis compared the rate of deaths while she was at work in the sectors where she was working with the rate in the opposite sectors. What does this mean? The ward was large and spread over two long wings of one floor of a large building, “Blocco B”, probably built in the sixties.
Between the two wings were central “supporting facilities” and also the main stairwell. Each wing consisted of many rooms (each room with several beds), with one long corridor through the whole building, see the floor plan below. Sector A and B rooms were in one wing, first A and then B as you you went down the corridor from the central part of the floor. Sector C and Sector D rooms were in the other wing, opposite to one another on each side of the corridor. Each nurse was usually detailed in her shifts to one sector, or occasionally to the two sectors in one wing. While working in one sector, a nurse could theoretically easily slip into a room in the adjacent sector. Anyway, the nurses often helped one another, so they often could be found in the “wrong sector”, but not often in the “wrong wing”.
Tagliaro and Micciolo (in the sequel: TM) went on to look at the death rates while Daniela was at work in different periods. They noticed that it was higher in 2013 than in 2012, even higher in the first quarter of 2014, then – after Daniela had been fired – it was much, much less. They conjectured that she was killing more and more patients as time went by, till the killing stopped dead on her suspension and arrest
TM certainly knew that, in theory, other factors might be the cause of an increased death rate on Poggiali’s shifts. They were proud of their innovative approach of relating each death that occurred while Daniela was at work to whether it occurred in Daniela’s wing or in the other. They wrote that in this way they had controlled for confounders, taking each death to provide its own “control”. (Similarly, in the case of Lucia de B., statistician Henk Elffers had come up with an innovative approach. In principle, it was not a bad idea, though all it showed was that nurses are different). TM did not control for any other confounding factors at all. In their explanation of their findings to the court, they repeatedly stated categorically that the association they had found must be causal, and Daniela’s presence was the cause. Add to this that their clumsy explanation of p-values might have misled lawyers, journalists and the public. In such a case, a p-value is the probability of what you see (more precisely, of at least what you see), assuming pure chance. That is not the same as the probability that pure chance was the cause of what you see – the fallacy of the transposed conditional, also known as “the prosecutor’s fallacy”.
Exercise to the reader: when is this fallacy not a fallacy? Hint: revise your knowledge of Bayes’ rule: posterior odds equals prior odds time likelihood ratio.
We asked Tagliaro and Micciolo for the original Excel spreadsheets and for the “R” scripts they had used to process the data. They declined to give them to us, saying this would not be proper since they were confidential. We asked Daniela’s lawyer to ask the court to ask for those computer files on our behalf. The court declined to satisfy our request. We were finally sent just the Excel files by the hospital administration, a week before we were called to give evidence. Fortunately, with a combination of OCR and a lot of painstaking handwork, a wealthy friend of Daniela’s lawyer had already managed to help us get the data files reconstructed. We performed a lot of analyses with the help of a succession of students because extracting what we needed from those spreadsheets was an extraordinarily challenging issue. One kept finding anomalies that had to be fixed in one way or another. Even when we had “clean” spreadsheets, it still was a mess.
Next, we started looking for confounding factors that might explain the difference between Daniela and her colleagues, which certainly was striking and real. But was it perhaps entirely innocent?
First of all, simple histograms showed that death rates on that ward varied strongly by month, with big peaks in June and again in December. (January is not high: elderly people stay home in January and keep themselves warm and safe). That is what one should expect. The humid heat and air pollution in the summer; or the damp and cold and the air pollution in the winter, exacerbated by winter flu epidemics. Perhaps Daniela worked more at bad times than at good times? No. It was clear that sectors A+B were different from C+D. Death rates were different, but also the number of beds in each wing was different. Perhaps Daniela was allocated more often to “the more difficult” sections? It was not so clear. Tagliaro and Micciolo computed death rates for the whole ward, or for each wing of the ward, but never took account of the number of patients in each wing nor of the severity of their illnesses.
Most interesting of all was what we found when we looked at the hour of the time of death of patients who died, and the minute of the time of death of patients who died. Patients tended to die at times which were whole hours, “half past” was also quite popular. There was however also a huge peak of deaths between midnight and five minutes past midnight! There were fewer deaths in a couple of hours soon after lunchtime. There were large peaks of deaths around the time of handover between shifts: 7:00 in the morning, 2:00 in the afternoon, 9:00 in the evening. The death rate is higher in the morning than in the afternoon, and higher in the afternoon than at night. When you’re dying (but not in intensive care, when it is very difficult to die at all) you do not die in your sleep at night. You die in the early morning as your vital organs start waking up for the day. Now, also not surprisingly, the number of nurses on a ward is largest in the morning when there is a huge amount of work to do; it’s much less in the afternoon and evening, and it’s even less at night. This means that a full-time nurse typically spends more time in the hospital during morning shifts than during afternoon shifts, and more time during afternoon shifts than during night shifts. The death rate shows the same pattern. Therefore, for every typical full-time nurse, the death rate while they are at work tends to be higher than when they are not at work!
Nurses aren’t authorized to officially register times of death. Only a doctor is authorized to do that. He or she is supposed to write down the time at which they have determined the patient is no longer alive. It seems that they often round that time to whole or half hours. The peak just after midnight is hard to explain. The date of death has enormous financial and legal consequences. The peak suggests that those deaths may have occurred anywhere in a huge time window. Whether or not doctors come to the wards on the dot at midnight and fill in forms for any patients who have died in the few hours before is hard to believe
What is now clear is that it is mainly around the hand-over between shifts that deaths get “processed”. Quite a few times of death are so hard to know that they are shunted to five minutes past midnight; many others are located in the hand-over period but might well have occurred earlier.
Some nurses tend to work longer shifts than others. Some conscientiously clock in as early as they are allowed, before their shift starts, and clock out as late as they can after their shift ends. Daniela was such a nurse. Her shifts were indeed statistically significantly longer than those of any of her colleagues. She very often stayed on duty several hours after the official end of the official ten-minute overlap between shifts. There was often a lot to do – one can imagine often involving taking care of the recently deceased. Not the nicest part of the job. Daniela was well known to be a rather conscientious and very hard worker, with a fiery temper, known to play pranks on colleagues or to loudly disagree with doctors for whom she had a healthy disrespect.
Incidentally, the rate of admissions to Umberto I hospital tumbled down after the news broke of a serial killer – and the news broke the day after the last day the serial killer was at work, together with the publication of the lurid “selfie”. The rate of deaths was slowly increasing over the two years up to then, as was in fact also the rate of admissions and the occupancy of the ward. A hospital getting slowly more stressed? Taking on more work?
If one finds a correlation between X and Y, it is a sound principle to suppose that it has a causal explanation. Maybe X causes Y, maybe Y causes X, … and maybe W causes both X and Y, or maybe X and Y both cause Z and there has been a selection on the basis of Z. In the case of Lucia de B., her association between inexplicable incidents and her presence on the ward was caused by her, since the definition of “unexpected and inexplicable incident” included her being there. She was already known to be a weird person, and it was already clear that there were more deaths than usual on her ward. The actual reason for that was a change of hospital policy, moving patients faster from intensive care to medium care so that they could die at home, rather than in the hospital. If she was not present, then the medical experts always could come up with an explanation for why that death, though perhaps a bit surprising at that moment, was expected to occur soon anyway. But if Lucia was there then they were inclined to believe in foul play because after all there were so many incidents in her shifts.
Julia and I are certain that the difference between Daniela’s death rates and those of other nurses is to a huge extent explainable by the anomalies in the data which we had discovered and by her long working hours.
Some residual difference could be due to the fact that a conscientious nurse actually notices when patients have died, while a lazy nurse keeps a low profile and leaves it to her colleagues to notice, at hand-over. We have been busy fitting sophisticated regression models to the data but this work will be reported in a specialist journal. It does not tell us more than what I have already said. Daniela is different from the other nurses. All the nurses are different. She is extreme in a number of ways: most hours worked, longest shifts worked. We have no idea how the hospital allocated nurses to sectors and patients to sectors. We probably won’t get to know the answer to that, ever. The medical world does not put out its dirty washing for everyone to see.
We wrote a report and gave evidence in person in Ravenna in early 2015. I did not have time to see the wonderful Byzantine mosaics though I was treated to some wonderful meals. I think my department paid for my air ticket. Julia and I worked “pro deo“. In our opinion, we totally shredded the statistical work of Tagliaro and Micciolo. The court however did not agree. “The statistical experts for the defence only offered a theoretical discourse while those of the prosecution had scientifically established hard facts”. In retrospect, we should have used stronger language in our report. Tagliaro and Micciolo stated that they had definitively proven that Daniela’s presencecaused 90 or so extra deaths. They stated that this number could definitely not be explained as a chance fluctuation. They stated that, of course, the statistics did not prove that she had deliberately murdered those patients. We, on the other hand, had used careful scientific language. One begins to understand how it is that experts like Tagliaro and Micciolo are in such high demand by public prosecutors.
There was also toxicological evidence concerning one of the patients and involving K+ Cl–, but we were not involved in that. There was also the “selfie”, there was character evidence. There were allegations of thefts of patients’ personal jewellery. It all added up. Daniela was convicted of just one murder. The statistical evidence provided her motive: she just loved killing people, especially people she didn’t like. No doubt, a forensic psychologist also explained how her personality fitted so well to the actions she was alleged to have done.
Rapidly, the public prosecution started another case based largely on the same or similar evidence but now concerning another patient, with whom Daniela had had a shouting match, five years earlier. In fact, this activity was probably triggered by families of other patients starting civil cases against the hospital. It would also clearly be in the interest of the hospital authorities to get new criminal proceedings against Daniela started. However, Daniela’s lawyers appealed against her first conviction. It was successfully overturned. But then the court of cassation overturned the acquittal. Meantime, the second case led to a conviction, then acquittal on appeal, then cassation. All this time Daniela was in jail. Cassations of cassations meant that Daniela had to be tried again, by yet another appeal court, for the two alleged murders. Julia and I and her young colleague Francesco Dotto got to work again, improving our arguments and our graphics and our formulations of our findings.
At some point, triggered by some discussions with the defence experts on toxicology and pathology, Julia took a glance at Tagliaro’s quite separate report on the toxicological evidence. This led to a breakthrough, as I will now explain.
Tagliaro knew the post-mortem “vitreous humour” potassium concentration of the last patient, a woman who had died on Daniela’s last day. That death had somehow surprised the hospital doctors, or rather, as it later transpired, it didn’t surprise them at all: they had already for three months been looking at the death rates while Daniela was on duty and essentially building up a dossier against her, just waiting for a suitable “last straw”! Moreover, they already had their minds on K+ Cl-, since some had gone missing and then turned up in the wrong place. Finally, Daniela had complained to her colleagues about the really irritating behaviour of that last patient, 73-year-old Rosa Calderoni.
“Vitreous humour” is the transparent, colourless, gelatinous mass that fills your eyeballs. While you are alive, it has a relatively low concentration of potassium. After death, cell walls break down, and potassium concentration throughout the body equalises. Tagliaro had published papers in which he studied the hourly rate of increase in the concentration, using measurements on the bodies of persons who had died at a known time of causes unrelated to potassium chloride poisoning. He even had some fresh corpses on which he could make repeated measurements. His motivation was to use this concentration as a tool to determine the PMI (post-mortem interval) in cases when we have a body and a post-mortem examination but no time of death. In one paper (without Micciolo’s aid) he did a regression analysis, plotting a straight line through a cloud of points (y = concentration, x = time since death). He had about 60 observations, mostly men, mostly rather young. In a second paper, now with Micciolo, he fitted a parabola and moreover noted that there was an effect of age and of sex. The authors also observed the huge variation around that fitted straight line and concluded that the method was not reliable enough for use in determining the PMI. But this did not deter Tagliaro, when writing his toxicological report on Rosa Calderoni! He knew the potassium concentration at the time of post-mortem, he knew exactly when she died, he had a number for the natural increase per hour after death from his first, linear, regression model. With this, he calculated the concentration at death. Lo and behold: it was a concentration which would have been fatal. He had proved that she had died of potassium chloride poisoning.
Julia and Francesco used the model of the second paper and found out that if you would assume a normal concentration at the time of death, and take account of the variability of the measurements and of the uncertainty in the value of the slope, then the concentration observed at the time of post-mortem was maybe above average, but not surprisingly large at all.
Daniela Poggiali became a free woman. I wish her a big compensation and a long and happy life. She’s quite a character.
Aside from the “couleur locale” of an Italian case, this case had incredibly much similarity with the case of Lucia de Berk. It has many similarities with quite a few other contested serial killer nurse cases, in various countries. According to a NetFlix series, in which a whole episode is devoted to Daniela, these horrific cases occur all the time. They are studied by criminologists and forensic psychologists, who have compiled a list of “red flags” intended to help warn hospital authorities. The scientific term here is “health care serial killer”, or HCSK. One of the HCSK red flags is that you have psychiatric problems. Another is that your colleagues think you are really weird. Especially when your colleagues call you an angel of death, that’s a major red flag. The list goes on. These lists are developed in scientific publications in important mainstream journals, and the results are presented in textbooks used in university criminology teaching programs. Of course, you can only scientifically study convicted HCSKs. Your sources of data are newspaper reports, judges’ summings up, the prosecution’s final summary of the case. It is clear that these red flags are the things that convince judges and jurors to deliver a guilty verdict. These are the features that will first make you a suspect, which police investigators will look for, and which will convince the court and the public of your guilt. Amusingly, one of the side effects of the case of Lucia de Berk was contributing a number of entries to this list, for instance, the Stephen King horror murder novels she had at home which were even alleged to have been stolen from the library. Her conviction for the theft of several items still stands. As does Daniela’s: this means that Daniela is not eligible for compensation. In neither case was there any real proof of thefts. Amusingly, one of the side effects of the case of Lucia de Berk was contributing a number of entries to this list. Embarrassingly, her case had to be removed from the collections of known cases after 2011, and the criminologists and forensic psychologists also now mention that statistical evidence of many deaths during the shifts of a nurse is not actually a very good red flag. They have learnt something, too.
Interesting is also the incidence of these cases: less than 1 in a million nurses killing multiple patients per year, according to these researchers. These are researchers who have the phenomenon of HCSKs as their life work, giving them opportunities to write lurid books on serial murder, appear in TV panels and TV documentaries explaining the terrible psychology of these modern-day witches, and to take the stand as prosecution witnesses. Now, that “base rate” is actually rather important, even if only known very roughly. It means that such crimes are very, very unusual. In the Netherlands, one might expect a handful of cases per century; maybe on average 100 deaths in a century. There are actually only about 100 murders altogether in the Netherlands per year. On the other hand, more than 1000 deaths every year are due to medical errors. That means that evidence against a nurse suspected of being a HCSK would be very strong indeed before it should convince a rational person that they have a new HCSK on their hands. Lawyers, judges, journalists and the public are unfortunately perhaps not rational persons. They are certainly not good with probability, and not good with Bayes’ rule. (It is not allowed to be used in a UK criminal court, because judges have ruled that jurors cannot possibly understand it).
I am still working on one UK case, Ben Geen. I believe it is yet another example of a typical innocent HCSK scare in a failing hospital leading to a typical unsafe conviction based largely on the usual red flags and a bit of bad luck. At least, I see no reason whatsoever to suppose that Ben Geen was guilty of the crimes for which he is sitting out a life sentence. Meanwhile, a new case is starting up in the UK: Lucy (!) Letby. I sincerely hope not to be involved with that one.
Time for a new generation of nosy statisticians to do some hard work.
Covadonga Palacio, Rossella Gottardo, Vito Cirielli, Giacomo Musile, Yvane Agard, Federica Bortolotti, and Franco Tagliaro. Simultaneous analysis of potassium and ammonium ions in the vitreous humour by capillary electrophoresis and their integrated use to infer the post mortem interval (PMI). Medicine, Science and the Law, 61(1 suppl):96–104, 2021. https://journals.sagepub.com/doi/abs/10.1177/0025802420934239
Nicola Pigaiani, Anna Bertaso, Elio Franco De Palo,Federica Bortolotti, and Franco Tagliaro. Vitreous humor endogenous compounds analysis for post-mortem forensic investigation. Forensic science international, 310:110235, 2020. https://doi.org/10.1016/j.forsciint.2020.110235
Francesco Dotto, Richard D. Gill and Julia Mortera (2022) Statistical Analyses in the case of an Italian nurse accused of murdering patients. Submitted to “Law, Probability, Risk” (Oxford University Press), accepted for publication subject to minor revision; preprint: https://arxiv.org/abs/2202.08895
Door statistici veroordeeld? Nederlands Juristenblad, 13, 686-688.
Here, the result of Google-translate by RD Gill; with some “hindsight comments” by him added in square brackets and marked “RDG”.
Would having posterior thoughts Not be offending the gods? Only the dinosaur Had them before Recall its fate! Revise your odds! (made for a limerick competition at a Bayesian congress).
The following article was the basis for two full-page articles on Saturday, March 13, 2004 in the science supplement of the NRC (with unfortunately disturbing typos in the ultimate calculation) and in “the Forum” of Trouw (with the expected announcement on the front page that I claimed that the chance that Lucia de B. was wrongly convicted was 80%, which is not the case)
Condemned by statisticians? Aart F. de Vos
Lucia de Berk [Aart calls her “Lucy” in his article. That’s a bit condescending – RDG] has been sentenced to life imprisonment. Statistical arguments played a role in that, although the influence of this in the media was overestimated. Many people died while she was on duty. Pure chance? The consulted statistician, Henk Elffers, repeated his earlier statement during the current appeal that the probability was 1 in 342 million. I quote from the article “Statisticians do not believe in coincidence” from the Haags Courant of January 30th: “The probability that nine fatal incidents took place in the JKZ during the shifts of the accused by pure chance is nil. (…) It wasn’t chance. I don’t know what it was. As a statistician, I can’t say anything about it. Deciding the cause is up to you”. The rest of the article showed that the judge had great difficulty with this answer, and did not manage to resolve those difficulties.
Many witnesses were then heard who talked about circumstances, plausibility, oddities, improbabilities and undeniably strong associations. The court has to combine all of this and arrive at a wise final judgment. A heavy task, certainly given the legal conceptual system that includes very many elements that have to do with probabilities but has to make do without quantification and without probability theory when combining them.
The crucial question is of course: how likely is it that Lucia de Berk committed murders? Most laypeople will think that Elffers answered that question and that it is practically certain.
This is a misunderstanding. Elffers did not answer that question. Elffers is a classical statistician, and classical statisticians do not make statements about what is actually going on, but only about how unlikely things are if nothing special is going on at all. However, there is another branch of statistics: the Bayesian. I belong to that other camp. And I’ve also been doing calculations. With the following bewildering result:
If the information that Elffers used to reach his 1 in 342 million were the only information on which Lucia de Berk was convicted, I think that, based on a fairly superficial analysis, there would be about an 80% chance of the conviction being wrong.
This article is about this great contrast. It is not an indictment of Elffers, who was extremely modest in the court when interpreting his outcome, nor a plea to acquit Lucia de Berk, because the court uses mainly different arguments, albeit without unequivocal statements of probability, while there is nothing which is absolutely certain. It is a plea to seriously study Bayesian statistics in the Netherlands, and this applies to both mathematicians and lawyers. [As we later discovered, many medical experts’ conclusions that certain deaths were unnatural was caused by their knowledge that Lucia had been present at an impossibly huge number of deaths – RDG]
There is some similarity to the Sally Clark case, which was sentenced to life imprisonment in 1999 in England because two of her sons died shortly after birth. A wonderful analysis can be found in the September 2002 issue of “living mathematics”, an internet magazine (http://plus.maths.org/issue21/features/clark/index.html)
An expert (not a statistician, but a doctor) explained that the chance that such a thing happened “just by chance” in the given circumstances was 1 in 73 million. I quote: “probably the most infamous statistical statement ever made in a British courtroom (…) wrong, irrelevant, biased and totally misleading.” The expert’s statement is completely torn to shreds in said article. Which includes mention of a Bayesian analysis. And a calculation that the probability that she was wrongly convicted was greater than 2/3. In the case of Sally Clark, the expert’s statement was completely wrong on all counts, causing half the nation to fall over him, and Sally Clark, though only after four years, was released. However, the case of Lucia de Berk is infinitely more complicated. Elffers’ statement is, I will argue, not wrong, but it is misleading, and the Netherlands has no jurisprudence, but judgments, and even though they are not directly based on extensive knowledge of probability theory, they are much more reasoned. That does not alter the fact that there is a common element in the Lucy de Berk and Sally Clark cases. [Actually, Elffers’ statement was wrong in its own terms. Had he used the standard and correct way to combine p-values from three separate samples, he would have ended up with a p-value of about 1/1000. Had he verified the data given him by the hospital, it would have been larger still. Had he taken account of heterogeneity between nurses and uncertainty in various estimates, both of which classical statisticians also know how to do too, larger still – RDG]
My calculations are therefore based on alternative statistics, the Bayesian, named after Thomas Bayes, the first to write about “inverse probabilities”. That was in 1763. His discovery did not become really important [in statistics] until after 1960, mainly through the work of Leonard Savage, who proved that when you make decisions under uncertainty you cannot ignore the question of what chances the possible states of truth have (in our case the states “guilty” and “not guilty”). Thomas Bayes taught us how you can learn about that kind of probability from data. Scholars agree on the form of those calculations, which is pure probability theory. However, there is one problem: you have to think about what probabilities you would have given to the possible states before you had seen your data (the prior). And often these are subjective probabilities. And if you have little data, the impact of those subjective probabilities on your final judgment is large. A reason for many classical statisticians to oppose this approach. Certainly in the Netherlands, where statistics is mainly practised by mathematicians, people who are trained to solve problems without wondering what they have to do with reality. After a fanatical struggle over the foundations of statistics for decades (see my piece “the religious war of statisticians” at http://staff.feweb.vu.nl/avos/default.htm) the parties have come closer together. With one exception: the classical hypothesis test (or significance test). Bayesians have fundamental objections to classical hypothesis tests. And Elffers’ statement takes the form of a classical hypothesis test. This is where the foundations debate focuses.
The Lucy Clog case
Following Elffers, who explained his method of calculation in the Nederlands Juristenblad on the basis of a fictional case “Klompsma” which I have also worked through (arriving at totally different conclusions), I want to talk about the fictional case Lucy Clog [“Klomp” is the Dutch word for “clog”; the suffix “-sma” indicates a person from the province of Groningen; this is all rather insulting – RDG]. Lucy Clog is a nurse who has experienced 11 deaths in a period in which on average only one case occurs, but where no further concrete evidence against her can be found. In this case too, Elffers would report an extremely small chance of coincidence in court, about 1 in 100 million [I think that de Vos is thinking of the Poisson(1) chance of at least 11 events. If so, it is actually a factor 10 smaller. Perhaps he should change “11 deaths” into “10 deaths” – RDG]. This is the case where I claim that a guilty conviction, given the information so far together with my assessment of the context, has a chance of about 80% of being wrong.
This requires some calculations. Some of them are complicated, but the most important aspect is not too difficult, although it appears that many people struggle with it. A simple example may make this key point clear.
You are at a party and a stranger starts telling you a whole story about the chance that Lucia de Berk is guilty, and embarks joyfully on complex arithmetical calculations. What do you think: is this a lawyer or a mathematician? If you say a mathematician because lawyers are usually unable to do mathematics, then you fall into a classical trap. You think: a mathematician is good at calculations, while the chance that a lawyer is good at calculations is 10%, so it must be a mathematician. What you forget is that there are 100 times more lawyers than mathematicians. Even if only 10% of lawyers could do this calculating stuff, there would still be 10 times as many lawyers as mathematicians who could do it. So, under these assumptions, the probability is 10/11 that it is a lawyer. To which I must add that (I think) 75% of mathematicians are male but only 40% of lawyers are male, and I did not take this into account. If the word “she” had been in the problem formulation, that would have made a difference.
The same mistake, forgetting the context (more lawyers than mathematicians), can be made in the case of Lucia de Berk. The chance that you are dealing with a murderous nurse is a priori (before you know what is going on) very much smaller than that you are dealing with an innocent nurse. You have to weigh that against the fact that the chance of 11 deaths is many times greater in the case of “murderous” than in the case of “innocent”.
The Bayesian way of performing the calculations in such cases also appears to be intuitively not easy to understand. But if we look back on the example of the party, maybe it is not so difficult at all.
The Bayesian calculation is best not done in terms of chances, but in terms of “odds”, an untranslatable word that does not exist in the Netherlands. Odds of 3 to 7 mean a chance of 3/10 that it is true and 7/10 that it is not. Englishmen understand what this means perfectly well, thanks to horse racing: odds of 3 to 7 means you win 7 if you are right and lose 3 if you are wrong. Chances and odds are two ways to describe the same thing. Another example: odds of 2 to 10 correspond to probabilities of 2/12 and 10/12.
You need two elements for a simple Bayesian calculation. The prior odds and the likelihood ratio. In the example, the prior odds are mathematician vs. lawyer 1 to 100. The likelihood ratio is the probability that a mathematician does calculations (100%) divided by the probability that a lawyer does (10%). So 10 to 1. Bayes’ theorem now says that you must multiply the prior odds (1 : 100) and the likelihood ratio (10 : 1) to get the posterior odds, so they are (1 x 10 : 100 x 1) = (10 : 100) = (1 : 10), corresponding to a probability of 1 / 11 that it is a mathematician and 10/11 that it is a lawyer. Precisely what we found before. The posterior odds are what you can say after the data are known, the prior odds are what you could say before. And the likelihood ratio is the way you learn from data.
Back to the Lucy Clog case. If the chance of 11 deaths is 1 in 100 million when Lucy Clog is innocent, and 1/2 when she is guilty – more about that “1/2” much later – then the likelihood ratio for innocent against guilty is 1 : 50 million. But to calculate the posterior probability of being guilty, you need the prior odds. They follow from the chance that a random nurse will commit murders. I estimate that at 1 to 400,000. There are forty thousand nurses in hospitals in the Netherlands, so that would mean nursing killings once every 10 years. I hope that is an overestimate.
Bayes’ theorem now says that the posterior odds of “innocent” in the event of 11 deaths would be 400,000 to 50 million. That’s 8 : 1000, so a small chance of 8/1008, maybe enough to convict someone. Yet large enough to want to know more. And there is much more worth knowing.
For instance, it is remarkable that nobody saw Lucy doing anything wrong. It is even stranger when further investigation yields no evidence of murder. If you think that there would still be an 80% chance of finding clues in the event of many murders, against of course 0% if it is a coincidence, then the likelihood ratio of the fact “no evidence was found” is 100 : 20 in favour of innocence. Application of the rule shows that we now have odds of 40 : 1000, so a small 4% chance of innocence. Conviction now becomes really questionable. And if the suspect continues to deny, which is more plausible when she is innocent than when she is guilty, say twice as plausible, the odds turn into 80 : 1000, almost 8% chance of innocence.
As an explanation, a way of looking at this that requires less calculation work (but says exactly the same thing) is as follows: It follows from the assumptions that in 20,000 years it occurs 1008 times that 11 deaths occur in a nurse’s shifts: 1,000 of the nurses are guilty and 8 are innocent. Evidence for murder is found for 800 of the guilty nurses, moreover, 100 of the remaining 200 confess. That leaves 100 guilty and 8 innocent among the nurses who did not confess and for whom no evidence for murder was found.
So Lucy Clog must be acquitted. And all the while, I haven’t even talked about doubts about the exact probability of 1 in 100 million that “by chance” 11 people die in so many nurses’ shifts, when on average it would only be 1. This probability would be many times higher in every Bayesian analysis. I estimate, based on experience, that 1 in 2 million would come out. A Bayesian analysis can include uncertainties. Uncertainties about the similarity of circumstances and qualities of nurses, for example. And uncertainties increase the chance of extreme events enormously, the literature contains many interesting examples. As I said, I think that if I had access to the data that Elffers uses, I would not get a chance of 1 in 100 million, but a chance of 1 in 2 million. At least I assume that for the time being; it would not surprise me if it were much higher still!
Preliminary calculations show that it might even be as high as 1 in 100,000. But 1 in 2 million already saves a factor of 50 compared to 1 in 100 million, and my odds would not be 80 to 1000 but 4000 to 1000, so 4 to 1. A chance of 80% to wrongly convict. This is the 80% chance of innocence that I mentioned in the beginning. Unfortunately, it is not possible to explain the factor 50 (or a factor 1000 if the 1 in 100,000 turns out to be correct) from the last step within the framework of this article without resorting to mathematics. [Aart de Vos is probably thinking of Poisson distributions, but adding a hyperprior over the Poisson mean of 1, in order to take account of uncertainty in the true rate of deaths, as well as heterogeneity between nurses, causing some to have shifts with higher death rates than others – RDG]
What I hope has become clear is that you can always add information. “Not being able to find concrete evidence of murder” and “has not confessed” are new pieces of evidence that change the odds. And perhaps there are countless facts to add. In the case of Lucia de Berk, those kinds of facts are there. In the hypothetical case of Lucy Clog, not.
The fact that you can always add information in a Bayesian analysis is the most beautiful aspect of it. From prior odds, you come through data (11 deaths) to posterior odds, and these are again prior odds for the next steps: no concrete evidence for murder, and no confession by our suspect. Virtually all further facts that emerge in a court case can be dealt with in this way in the analysis. Any fact that has a different probability under the hypothesis of guilt than under the hypothesis of innocence contributes. Perhaps the reader has noticed that we only talked about the chances of what actually happened under various hypotheses, never about what could have happened but didn’t. A classic statistical test always talks about the probability of 11 or more deaths. That “or more” is irrelevant and misleading according to Bayesians. Incidentally, it is not necessarily easier to just talk about what happened. What is the probability of exactly 11 deaths if Lucy de Clog is guilty? The number of murders, something with a lot of uncertainty about it, determines how many deaths there are, but even though you are fired after 11 deaths, the classical statistician talks about the chance of you committing even more if you are kept on. And that last fact matters for the odds. That’s why I put in a probability of 50%, not 100%, for a murderous nurse killing exactly 11 patients. But that only makes a factor 2 difference.
It should be clear that it is not easy to come to firm statements if there is no convincing evidence. The most famous example, for which many Bayesians have performed calculations, is a murder in California in 1956, committed by a black man with a white woman in a yellow Cadillac. A couple who met this description was taken to court, and many statistical analyses followed. I have done a lot of calculations on this example myself, and have experienced how difficult, but also surprising and satisfying, it is to constantly add new elements.
A whole book is devoted to a similar famous case: “a Probabilistic Analysis of the Sacco and Vanzetti Evidence,” published in 1996 by Jay Kadane, professor of Carnegie Mellon and one of the most prominent Bayesians. If you want to know more, just consult his c.v. on his website http://lib.stat.cmu.edu/~kadane. In the “Statistics and the Law” field alone, he has more than thirty publications to his name, along with hundreds of other articles. This is now a well-developed field in America.
I have thought for a long time about what the conclusion of this story is, and I have had to revise my opinion several times. And the perhaps surprising conclusion is: the actions of all parties are not that bad, only their rationalization is, to put it mildly, a bit strange. Elffers makes strange calculations but formulates the conclusions in court in such a way that it becomes intuitively clear that he is not giving the answer that the court is looking for. The judge makes judgments that sound as though they are in terms of probabilities but I cannot figure out what the judge’s probabilities are. But when I see what is going on I do get the feeling that it is much more like what is optimal than I would have thought possible, given the absurd rationalisations. The explanation is simple: judges’ actions are based on a process learnt by evolution, judges’ justifications are stuck on afterwards, and learnt through training. In my opinion, the Bayesian method is the only way to balance decisions under uncertainty about actions and rationalization. And that can be very fruitful. But the profit is initially much smaller than people think. What the court does in the case of Lucia de B is surprisingly rational. The 11 deaths are not convincing in themselves, but enough to change the prior odds from 1 in 40,000 to odds from 16 to 5, in short, an order of magnitude in which it is necessary to gather additional information before judging. Exactly what the court does. [de Vos has an optimistic view. He does not realise that the court is being fed false facts by the hospital managers – they tell the truth but not the whole truth; he does not realise that Elffers’ calculation was wrong because de Vos, as a Bayesian, doesn’t know what good classical statisticians do; neither he nor Elffers checks the data and finds out how exactly it was collected; he does not know that the medical experts’ diagnoses are influenced by Elffers’ statistics. Unfortunately, the defence hired a pure probabilist, and a kind of philosopher of probability, neither of whom knew anything about any kind of statistics, whether classical or Bayesian – RDG]
When I made my calculations, I thought at times: I have to go to court. I finally sent the article but I heard nothing more about it. It turned out that the defence had called for a witness who seriously criticized Elffers’ calculations. However, without presenting the solution. [The judge found the defence witness’s criticism incomprehensible, and useless to boot. It contained no constructive elements. But without doing statistics, anybody could see that the coincidence couldn’t be pure chance. It wasn’t: one could say that the data was faked. On the other hand, the judge did understand Elffers perfectly well – RDG].
Maybe I will once again have the opportunity to fully calculate probabilities in the Lucia de Berk case. That could provide new insights. But it is quite a job. In this case, there is much more information than is used here, such as poisonous traces in patients. Here too, it is likely that a Bayesian analysis that takes into account all the uncertainties shows that statements by experts who say something like “it is impossible that there is another explanation than the administration of poison by Lucia de Berk” should be taken with a grain of salt. Experts are usually people who overestimate their certainty. On the other hand, incriminating information can also build up. Ten independent facts that are twice as likely under the hypothesis of guilt change the odds by a factor of 1000. And if it turns out that the toxic traces found in the bodies of five deceased patients are each nine times more likely if Lucia is a murderer than if she isn’t, it saves a factor of nine to the fifth, a small 60,000. Etc, etc
But I think the court is more or less like that. It uses an incomprehensible language, that is, incomprehensible to probabilists, but a language sanctioned by evolution. We have few cases of convictions that were found to be wrong in the Netherlands. [Well! That was a Dutch layperson, writing in 2004. According to Ton Derksen, in the Netherlands about 10% of very long term prisoners (very serious cases) are innocent. It is probably something similar in other jurisdictions – RDG].
If you did the entire process in terms of probability calculations, the resulting debates between prosecutors and lawyers would become endless. And given their poor knowledge of probability, it is also undesirable for the time being. They have their secret language that usually led to reasonable conclusions. Even the chance that Lucia de Berk is guilty cannot be expressed in their language. There is also no law in the Netherlands that defines “legal and convincing evidence” in terms of the chance that a decision is correct. Is that 95%? Or 99%? Judges will maintain that it is 99.99%. But judges are experts.
So I don’t think it’s wise to try to cast the process in terms of probability right now. But perhaps this discussion will produce something in the longer term. Judges who are well informed about the statistical significance of the starting situation and then write down a number for each piece of evidence of prosecutor and defender. The likelihood ratio of each fact must be motivated. At the end, multiply all these numbers together, and have the calculations checked again by a Bayesian statistician. However, I consider this a long-term perspective. I fear (I am not really young anymore) it won’t come in my lifetime.