Since 2015, Bell-type experiments designed to test local realism have the following format: the format of a so-called “loophole-free Bell test”. There is a fixed sequence of N time-slots, or more precisely, paired time-slots. These are time-slots in two distant labs owned by two scientists Alice and Bob. The time-slots are paired such that a signal sent at the start of one of Alice’s time slots from Alice’s to Bob’s lab, travelling at the speed of light, would only reach Bob’s lab after the end of Bob’s corresponding time-slot; and vice versa. Just after the start of each time-slot, each inserts a binary setting into an experimental device. Something goes on inside that apparatus, and before the time-slot is over, a binary outcome is produced. Each instance with two inputs and two outputs is called a trial.
Actually, many experiments require a slightly more elaborate protocol involving a third lab, which you may think of as a source of “pairs of particles”. Charlie’s lab is located somewhere between Alice and Bob’s. Charlie’s device outputs the message “ready” or “not ready” before the end of his time-slot (its length is irrelevant). The message however could only arrive at Alice and Bob’s lab after they have already input their input settings, so could not directly influence their choices. Outcomes get delivered anyway. After the experiment, one looks only at the inputs and outputs of each trial in which Charlie saw the output “ready”. The experiment continues long enough that there are N trials labelled by Charlie’s apparatus as “ready”. From now on, I will forget about this “post-selection” of N trials: the first N which went off to a good start. (The word “post-selection” is a misnomer. It is performed after the whole experiment is complete, but the selection is determined in advance of the introduction of the settings).
The settings are typically chosen to resemble sequences of outcomes of independent fair coin tosses. Sometimes they are generated by physical random number generators using physical noise sources, sometimes they are created using pseudo random number generators (RNGs). Sometimes they are generated on the fly, sometimes created in advance. The idea is that the settings are inputs which come from the outside world, outside the experimental devices, and the outcomes are outputs delivered by the devices to the outside world.
Below is a graphical model specified in the language of the present-day theory of causality based on directed acyclic graphs (DAGs), describing the dependence structure of what is observed in terms of “hidden variables”. There is no assumption that the hidden parts of the structure are classical, nor that they are located in classical space-time. The node “psi” stands for the state of all experimental apparatus in the three labs including transmission lines between them before one trial of the experiment starts, as far as is directly relevant in the causal process leading from experimental inputs to experimental outputs. The node “phi” consists of the state of external devices which generate the settings. The graphical model says that as far as the settings and the outputs are concerned, “phi” and “psi” can be taken to be independent. It says that Bob’s setting is not in the causal pathway to Alice’s outcome.
At the end of the experiment, we have N quadruples of binary bits (a, b, x, y). Here, a and b are the settings and x and y are the outcomes in one of the N “trials”. We can now count the number z of trialsin which x = yandneithera or b = 1, together with trials in which x ≠ yandbotha and b = 1. Those two kinds of trials are both considered trials having the result “success”. The trials remaining have the result “fail”.
Now, let B(p) denote a random variable distributed according to the binomial distribution with parameters N and p. Think of the number of successes z to be the outcome of a random variable Z. According to local realism, and taking p = 0.75, it can be proved that for all z > N p, Prob( Z ≥ z ) ≤ Prob( B(p) ≥ z ). According to quantum mechanics, and with q = 0.85, it appears possible to arrange that for all z, Prob( Z ≤ z ) = Prob( B(q) ≤ z ). Let’s see what those binomial tail probabilities are with z = 0.80 N, using the statistical programming language “R“.
N <- 1000 p <- 0.75 z <- 0.8 * N q <- 0.85 pbinom(z, N, p, lower.tail = FALSE)  8.029329e-05 pbinom(z, N, q, lower.tail = TRUE)  1.22203e-05
We see that an experiment with N = 1000 time-slots should be plenty to decide whether the experimental results are the result of local realism with a success rate of maximally 75%, or of quantum mechanics with a success rate of 85% (close to the theoretical maximum under quantum mechanics). The winning theory is decided by seeing if the observed success rate is above or below 80%.
Challenge: show by a computer simulation that my claims are wrong. ie, simulate a “loophole-free” Bell experiment with a success rate reliably exceeding 80% when the number of trials is 1000 or more. Rules of the game: you must allow me to supply the “fair coin tosses”. Your computer simulation may use an RNG (called a fixed number of times per trial) to create its own randomness, but it must have “set seed” and “restore seed” facilities in order to make each run exactly reproducible if required. For each n, Alice’s nth output x may depend only on Alice’s nth input a, together with (if desired) all the preceding inputs and outputs. Similarly, Bob’s nth output y may depend only on Bob’s input b, together with (if desired) all the preceding inputs and outputs
The title of this blog might refer to the trials of Amanda Knox in the case of the murder of Meredith Kercher [Add info and/or references. Footnote or side bar?]. However, I am writing about a case that is much less known outside of Italy [neither victim nor alleged murderer was a rich American girl]: the case of Daniela Poggiali, a nurse suspected by the media and accused by prosecution experts of having killed around 90 patients in a two-year killing spree terminated by her arrest in April 2014. She has just been exonerated after a total of three years in prison with a life sentence as well some months of pre-trial detention. This case revolved around statistics of an increased death rate during the shifts of a colourful nurse. I was a scientific expert for the defence, working with an Italian colleague, Julia Mortera (Univ. Rome Tre), later assisted by a young researcher Francesco Dotto.
Piet Groeneboom and I worked together on the statistics of the case of Lucia de Berk, see our paper in Chance [Reference]. In fact, it was remarkable that the statistical community in the Netherlands got so involved in that case. A Fokke and Sukke cartoon (with Fokke and Sukke dressed as Dutch judges) had the exchange “All 14 professors of statistics in the country have signed the petition that Lucia’s case should be reopened. – That cannot be a coincidence”. Indeed, it wasn’t. That was one of the high points of my career. Another was Lucia’s final acquittal in 2011, at which the judges took the trouble to say out loud, in public, that the nurses had fought heroically for the lives of their patients; lives squandered, they added, by their doctors’ medical errors.
At that point, I felt we had learnt how to fight miscarriages of justice like that, of which I rapidly became involved in several. So far, however, with rather depressing results. Till a couple of months ago. This story will not have much to do with mathematics. It will have to do with simple descriptive statistics, and I will also mention the phrases “p-value” and “Bayes’ theorem” a few times. I think it is important for mathematicians in general to know more about what statisticians can do – not so much through using deep and exciting mathematics, though that does happen too, of course – but because one of the skills of a professional statistician is the abstraction of messy real-world problems involving chance and data. It’s not for everybody. Many mathematical statisticians prefer to prove theorems, just like any other mathematician. In fact, I often do prefer to do that myself, but I like more being able to alternate between the two modes of activity, and I do like sticking my nose into other people’s business, and learning about what goes on in, for instance, law, medicine, or anything else. Each of the two activity modes is a nice therapy for the frustrations which inevitably come with the other.
The Daniela Poggiali case began, for me, soon after the 8th of April, 2014, when it was first reported in international news media. A nurse at the Umberto I hospital in the small town of Lugo, not far from Ravenna, had been arrested and was being investigated for serial murder. She had had photos of herself taken laughing, close to the body of a deceased patient, and these “selfies” were soon plastered over the front pages of tabloid media. Pretty soon, they arrived in The Guardian and The New York Times. The newspapers sometimes suggested she had killed 93 patients, sometimes 31, sometimes it was other large numbers. It was suspected that she had used Potassium Chloride on some of those patients. An ideal murder weapon for a killer nurse since easily available in a hospital, easy to give to a patient who is already hooked up to an IV drip, kills rapidly (cardiac arrest – it is used in America for executions), and after a short time hard to detect. After death, it redistributes itself throughout the body where it becomes indistinguishable from a normal concentration of Potassium.
Many features of the case reminded me strongly of the case of Lucia de Berk in the Netherlands. In fact, it seemed very fishy indeed. I found the name of Daniela’s lawyer in the online Italian newspapers, Google found me an email address, and I sent a message offering support on the statistics of the case. I also got an Italian statistician colleague and good friend, Julia Mortera, interested. Daniela’s lawyer was grateful for our offer of help. The case largely hinged on a statistical analysis of the coincidence between deaths on a hospital ward and Daniela’s shifts there. We were emailed pdfs of scanned pages of a faxed report of around 50 pages containing results of statistical analyses of times of shifts of all the nurses working on the ward, and times of admission and discharge (or death) of all patients, during much of the period 2012 – 2014. There were a further 50 pages (also scanned and faxed) of appendices containing print-outs of the raw data submitted by hospital administrators to police investigators. Two huge messy Excel spreadsheets.
The authors of the report were Prof. Franco Tagliaro (Univ. Verona) and Prof. Rocco Micciolo (Univ. Trento). The two are respectively a pathologist/toxicologist and an epidemiologist. The epidemiologist Micciolo is a professor in a social science department, and member of an interfaculty collaboration for the health sciences. We found out that the more senior and more distinguished author Tagliaro had published many papers on toxicology in the forensic science literature, usually based on empirical studies using data sets provided by forensic institutes. Occasionally, his (relative) junior Micciolo turned up in the list of authors and had supplied statistical analyses. Micciolo calls himself a biostatistician. He has written Italian language textbooks on exploratory data-analysis with the statistical package “R” and is frequently the statistician-coauthor of papers written by scientists from his university in many different fields including medicine and psychology. They both had decent H-indices: their publications were in good journals, their work was mainstream, useful, “normal science”. They were not amateurs. Or were they?
Daniela Poggiali worked on a very large ward with very many very old patients, many suffering terminal illnesses. Ages ranged from 50 up to 105. Most commonly they were around ninety. The ward had about 60 beds and most were usually occupied. Patients tended to stay one to two weeks in the hospital, admitted to the hospital for reasons of acute illness. There was on average one death every day; some days none, some days up to four. Most patients were discharged after several weeks in hospital to go home or to a nursing home. It was an ordinary “medium care” nursing department (i.e., not an Intensive Care unit).
Some very simple statistics showed that the death rate on days when Poggiali worked was much higher than on days when she did not work. A more refined analysis compared the rate of deaths during the hours she worked with the rate of deaths during the hours she was not at work. Again, her presence “caused” a huge excess, statistically highly significant. A yet more refined analysis compared the rate of deaths while she was at work in the sectors where she was working with the rate in the opposite sectors. What does this mean? The ward was large and spread over two long wings of one floor of a large building, “Blocco B”, built perhaps in the sixties.
Between the two wings were central “supporting facilities” and also the main stairwell. Each wing consisted of many rooms (each room with several beds), with one long corridor through the whole building, see the floor plan below. Sector A and B rooms were in one wing, first A and then B as you you went down the corridor from the central part of the floor. Sector C and Sector D rooms were in the other wing, opposite to one another on each side of the corridor. Each nurse was usually detailed in her shifts to one sector, or occasionally to the two sectors in one wing. While working in one sector, a nurse could theoretically easily slip into a room in the adjacent sector. Anyway, the nurses often helped one another, so they often could be found in the “wrong sector”, but not often in the “wrong wing”.
Tagliaro and Micciolo went on to look at the death rates while Daniela was at work in different periods. They noticed that it was higher in 2013 than in 2012, even higher in the first quarter of 2014, then – after Daniela had been fired – it was much, much less. Was she killing more and more patients as time went by? Till the killing stopped dead on her suspension and then arrest.
Tagliaro and Micciolo were, one could say, theoretically aware of the fact that other factors might be the cause of different death rates on Poggiali’s shifts than not. They were proud of their trick of comparing death rates in the hospital wing where she was supposed to be, with the rate in the other, while Daniela was at work. They wrote that in this way they had controlled for confounders, taking each death to provide its own “control”. They did not control for any other confounding factors at all. In their explanation of their findings to the court they repeatedly stated categorically that the association they had found must be causal, and Daniela’s presence was the cause. Add to this that their clumsy explanation of p-values might have mislead lawyers, journalists and the public. In such a case, a p-value is the probability of what you see (more precisely, of at least what you see), assuming pure chance. That is not the same as the probability that pure chance was the cause of what you see – the fallacy of the transposed conditional, also known as “the prosecutor’s fallacy”.
Exercise to the reader: when is this fallacy not a fallacy? Hint: revise your knowledge of Bayes’ rule. [Elementary maths, obligatory in the high-school pre-university social science and humanities stream, though often not taught to – or not taken much notice by – future exact scientists.]
We asked Tagliaro and Micciolo for the original Excel spreadsheets and for the “R” scripts [R reference] they had used to process the data. They declined to give them to us, saying this would not be proper since they were confidential. We asked Daniela’s lawyer to ask the court to ask for those computer files on our behalf. The court declined to satisfy our request. We were finally sent just the Excel files by the hospital administration, a week before we were called to give evidence. With a combination of OCR and a lot of painstaking handwork, Daniela’s lawyer’s rich friend had in the meantime managed to help us get the data files reconstructed. We performed a lot of analyses with the help of a succession of students because extracting what we needed from those spreadsheets was an extraordinarily challenging issue. One kept finding anomalies that had to be fixed in one way or another. Even when we had “clean” spreadsheets, it still was a mess.
Next, we started looking for confounding factors that might explain the difference between Daniela and her colleagues, which certainly was striking and real. But was it perhaps entirely innocent?
First of all, simple histograms showed that death rates on that ward varied strongly by month, with big peaks in June and again in January. That is what one should expect. The humid heat and air pollution in the summer; or the damp and cold and the air pollution in the winter, exacerbated by winter flu epidemics. Perhaps Daniela worked more at bad times than at good times? No. It was clear that sectors A+B were different from C+D. Death rates were different but also the number of beds in each wing was different. Perhaps Daniela was allocated more often to “the more difficult” sections? It was not so clear. Tagliaro and Micciolo computed death rates for the whole ward, or for each wing of the ward, but never took account of the number of patients in each wing nor of the severity of their illnesses.
Most interesting of all was what we found when we looked at the hour of the time of death of patients who died, and the minute of the time of death of patients who died. Patients tended to die at times which were whole hours, “half past” was also quite popular. There was however also a huge peak of deaths between midnight and five minutes past midnight! There were fewer deaths in a couple of hours soon after lunch time. There were large peaks of deaths around the time of handover between shifts: 7:00 in the morning, 2:00 in the afternoon, 9:00 in the evening. The death rate is higher in the morning than in the afternoon and higher in the afternoon than at night. When you’re dying (but not in intensive care, when it is very difficult to die at all) you do not die in your sleep at night. You die in the early morning as your vital organs start waking up for the day. Now, also not surprisingly, the number of nurses on a ward is largest in the morning when there is a huge amount of work to do; it’s much less in the afternoon and evening; and it’s even less at night. This means that a full-time nurse typically spends more time in the hospital during morning shifts than during afternoon shifts, and more time during afternoon shifts than during night shifts. The death rate shows the same pattern. Therefore, for every typical full-time nurse, the death rate while they are at work tends to be higher than when they are not at work!
Nurses aren’t authorized to officially register times of death. Only a doctor is authorized to do that. He or she is supposed to write down the time at which they have determined the patient is no longer alive. It seems that they often round that time to whole or half hours. The peak just after midnight is hard to explain. The date of death has enormous financial and legal consequences. The peak suggests that those deaths may have occurred anywhere in a huge time window. Whether or not doctors come to the wards on the dot at midnight and fill in forms for any patients who have died in the few hours before is hard to believe
What is now clear is that it is mainly around the hand-over between shifts that deaths get “processed”. Quite a few times of death are so hard to know that they are shunted to five minutes past midnight; many others are located in the hand-over period but might well have occurred earlier.
Some nurses tend to work longer shifts than others. Some conscientiously clock in as early as they are allowed, before their shift starts, and clock out as late as they can after their shift ends. Daniela was such a nurse. Her shifts were indeed statistically significantly longer than those of any of her colleagues. She very often stayed on duty several hours after the official end of the official ten-minute overlap between shifts. There was often a lot to do – one can imagine often involving taking care of the recently deceased. Not the nicest part of the job. Daniela was well known to be a rather conscientious and very hard worker, with a fiery temper, known to play pranks on colleagues or to loudly disagree with doctors for whom she had a healthy disrespect.
Incidentally, the rate of admissions to Umberto I hospital tumbled down after the news broke of a serial killer – and the news broke the day after the last day the serial killer was at work, together with the publication of the lurid “selfie”. The rate of deaths was slowly increasing over the two years up to then, as was in fact also the rate of admissions and the occupancy of the ward. A hospital getting slowly more stressed? Taking on more work?
Julia and I are certain that the difference between Daniela’s death rates and those of other nurses is to a huge extent explainable by the anomalies in the data which we had discovered and by her long working hours.
Some residual difference could be due to the fact that a conscientious nurse actually notices when patients have died, while a lazy nurse keeps a low profile and leaves it to her colleagues to notice, at hand-over. We have been busy fitting sophisticated regression models to the data but this work will be reported in a specialist journal. It does not tell us more than what I have already said. Daniela is different from the other nurses. All the nurses are different. She is extreme in a number of ways: most hours worked, longest shifts worked. We have no idea how the hospital allocated nurses to sectors and patients to sectors. We probably won’t get to know the answer to that, ever. The medical world does not put out its dirty washing for everyone to see.
We wrote a report and gave evidence in person in Ravenna in early 2015. I did not have time to see the wonderful Byzantine mosaics though I was treated to some wonderful meals. I think my department paid for my air ticket. Julia and I worked “pro deo“. In our opinion, we totally shredded the statistical work of Tagliaro and Micciolo. The court however did not agree. “The statistical experts for the defence only offered a theoretical discourse while those of the prosecution had scientifically established hard facts”. In retrospect, we should have used stronger language in our report. Tagliaro and Micciolo stated that they had definitively proven that Daniela’s presencecaused 90 or so extra deaths. They stated that this number could definitely not be explained as a chance fluctuation. They stated that, of course, the statistics did not prove that she had deliberately murdered those patients. We, on the other hand, had used careful scientific language. One begins to understand how it is that experts like Tagliaro and Micciolo are in such high demand by public prosecutors.
There was also toxicological evidence concerning one of the patients and involving K+ Cl–, but we were not involved in that. There was also the “selfie”, there was character evidence. There were allegations of thefts of patients’ personal jewellery. It all added up. Daniela was convicted of just one murder. The statistical evidence provided her motive: she just loved killing people, especially people she didn’t like. [Was there also evidence by psychologists?]
Rapidly, the public prosecution started another case based largely on the same or similar evidence but now concerning another patient, with whom Daniela had had a shouting match, five years earlier. In fact, this activity was probably triggered by families of other patients starting civil cases against the hospital. It would also clearly be in the interest of the hospital authorities to get new criminal proceedings against Daniela started. However, Daniela’s lawyers appealed against her first conviction. It was successfully overturned. But then the court of cassation overturned the acquittal. Meantime, the second case led to a conviction, then acquittal on appeal, then cassation. All this time Daniela was in jail. Cassations of cassations meant that Daniela had to be tried again, by yet another appeal court, for the two alleged murders. Julia and I and her young colleague Francesco Dotto got to work again, improving our arguments and our graphics and our formulations of our findings.
At some point, triggered by some discussions with the defence experts on toxicology and pathology, Julia took a glance at Tagliaro’s quite separate report on the toxicological evidence. This led to a breakthrough, as I will now explain.
Tagliaro knew the post-mortem “vitreous humour” potassium concentration of the last patient, a woman who had died on Daniela’s last day. That death had somehow surprised the hospital doctors, or rather, as it later transpired, it didn’t surprise them at all: they had already for three months been looking at the death rates while Daniela was on duty and essentially building up a dossier against her, just waiting for a suitable “last straw”! Moreover, they already had their minds on K+ Cl-, since some had gone missing and then turned up in the wrong place. Finally, Daniela had complained to her colleagues about the really irritating behaviour of that last patient, 73-year-old Rosa Calderoni.
“Vitreous humour” is the transparent, colourless, gelatinous mass which fills your eyeballs. While you are alive it has a relatively low concentration of potassium. After death, cell walls break down, and potassium concentration throughout the body equalises. Tagliaro had published papers in which he studied the hourly rate of increase in the concentration, using measurements on the bodies of persons who had died at a known time of causes unrelated to potassium chloride poisoning. He even had some fresh corpses on which he could make repeated measurements. His motivation was to use this concentration as a tool to determine the PMI (post-mortem interval) in cases when we have a body and a post-mortem examination but no time of death. In one paper (without Micciolo’s aid) he did a regression analysis, plotting a straight line through a cloud of points (y = concentration, x = time since death). He had about 60 observations, mostly men, mostly rather young. In a second paper, now with Micciolo, he fitted a parabola, and moreover noted that there was an effect of age and of sex. The authors also observed the huge variation around that fitted straight line and concluded that the method was not reliable enough for use in determining the PMI. But this did not deter Tagliaro, when writing his toxicological report on Rosa Calderoni! He knew the potassium concentration at the time of post-mortem, he knew exactly when she died, he had a number for the natural increase per hour after death from his first, linear, regression model. With this, he calculated the concentration at death. Lo and behold: it was a concentration which would have been fatal. He had proved that she had died of potassium chloride poisoning.
Julia and Francesco used the model of the second paper and found out that if you would assume a normal concentration at the time of death, and take account of the variability of the measurements and of the uncertainty in the value of the slope, then the concentration observed at the time of post mortem was maybe above average, but not surprisingly large at all.
Daniela Poggiali became a free woman. I wish her a big compensation and a long and happy life. She’s quite a character.
Aside from the “couleur locale” of an Italian case, this case had incredibly much similarity with the case of Lucia de Berk. It has many similarities with quite a few other contested serial killer nurse cases, in various countries. According to a NetFlix series, in which a whole episode is devoted to Daniela, these horrific cases occur all the time. They are studied by criminologists and forensic psychologists who have compiled a list of “red flags” intended to help warn hospital authorities. The scientific term here is “health care serial killer”, or HCSK. One of the HCSK red flags is that you have psychiatric problems. Another is that your colleagues think you are really weird. Especially when your colleagues call you an angel of death, that’s a major red flag. The list goes on. These lists are developed in scientific publications in important mainstream journals, and the results are presented in textbooks used in university criminology teaching programs. Of course, you can only scientifically study convicted HCSKs. Your sources of data are newspaper reports, judges’ summings up, the prosecution’s final summary of the case. It is clear that these red flags are the things that convince judges and jurors to deliver a guilty verdict. These are the features that will first make you a suspect, which police investigators will look for, and which will convince the court and the public of your guilt. Amusingly, one of the side effects of the case of Lucia de Berk was contributing a number of entries to this list. Embarrassingly, her case had to be removed from the collections of known cases after 2011, and the criminologists and forensic psychologists also now mention that statistical evidence of many deaths during the shifts of a nurse is not actually a very good red flag. They have learnt something, too.
Interesting is also the incidence of these cases: around 1 in a million hospital deaths per year, according to these researchers [reference! Check, how many whats per what?]. These are researchers who have the phenomenon of HCSKs as their life work, giving them opportunities to write lurid books on serial murder, appear in TV panels and TV documentaries explaining the terrible psychology of these modern-day witches, and to take the stand as prosecution witnesses. Now, that “base rate” is actually rather important, even if only known very roughly. It means that such crimes are very, very unusual. In the Netherlands, one might expect a handful of cases per century; maybe on average 100 deaths in a century. There are actually only about 100 murders altogether in the Netherlands per year. On the other hand, more than 1000 deaths every year are due to medical errors. That means that evidence against a nurse suspected of being a HCSK has to be incredibly strong, before it can convince a rational person that they have a new HCSK on their hands. Lawyers, judges, journalists and the public are unfortunately perhaps not rational persons. They are certainly not good with probability, and not good with Bayes’ theorem. (It is not allowed to be used in a UK criminal court, because judges have ruled that jurors cannot possibly understand it).
I am still working on one UK case, Ben Geen. I believe it is yet another example of a typical innocent HCSK scare in a failing hospital leading to a typical unsafe conviction based largely on the usual red flags and a little bit of bad luck. At least, I see no reason whatsoever to suppose that Ben Geen was guilty of the crimes for which he is sitting out a life sentence. Meanwhile, a new case is starting up in the UK: Lucy (!) Letby. I sincerely hope not to be involved with that one. Time for a new generation of nosy statisticians to do some hard work.