The Big Bell Bet

Poet and McGill University emeritus professor of chemistry Bryan Sanctuary (Google scholar:; personal blog: is betting me 5000 Euro that he can resolve the EPR-Bell paradox to the satisfaction of the majority of our peers. Moreover, he will do it by a publication (or at least, a pre-publication) within the year. That’s this calendar year, 2022. Naturally, he expects public (scientific) acclaim to follow “in no time”. I don’t expect that. We will settle the bet by consultation with our peers, and this consultation will be concluded by the end of the following year. So that’s by the end of the succeeding calendar year, 2023.

John S. Bell inspects the Christmas present which his friends the Bertlmanns have just given him

I, therefore, expect his gracious admission of defeat and a nice check for 5000 Euro, two years from now.

He expects the opposite. (Poor Bryan! It’s like taking candy from a baby…)

(He presumably thinks the same)

The small print

Small print item 1: Who are our peers? Like a jury, they will be determined by having our mutual approval. To begin with, we will invite the members of a couple of Google groups/internet seminars in which one or both of us already participate. Here are links to two of them: Jarek Duda’s (Krakow) “QM foundations & nature of time seminar”: and; and Alexandre de Castro’s Google group “Bell inequalities and quantum foundations”:

Small print item 2: What does Bryan think he’s going to achieve? Restoration of locality and realism, and banning of weirdness and spookiness from quantum mechanics.

Small print item 3: What do I think about his putative theory? Personally, but it is not up to me to decide, I would accept that he has won if his theory (which he has not yet revealed to the world) would allow me to win my Bell game challenge “against myself”. i.e., it would allow me to write computer programs to simulate a successful loophole-free Bell experiment – thus satisfying the usual spatiotemporal constraints on inputs and outputs while preventing conspiracy, and reliably violating a suitable Bell inequality by an amount that is both statistically and physically significant. This means that, in my opinion, he should only win if he can convince the majority of our peers that those constraints are somehow unphysical. I mention that if experimenters voluntarily impose those constraints (to the best of their ability) in real experiments, then there cannot be a metaphysical reason to forbid them. However, the bet will be settled by a democratic vote of our peers! Clearly, this does constitute a loophole for me: a majority of our peers might still fall for superdeterminism or any other craziness.

I suspect that Bryan believes he can now resurrect his previous attempt I think it is very brave of him but doomed to failure, because I don’t think he will come up with a theory that will catch on. (I even believe that such a theory is not even possible, but that’s my personal belief).

To reiterate: our peers will determine who has won our bet. Bryan is betting that a year from now he will have revolutionised quantum mechanics, restoring locality and realism and that his then appearing paper will rapidly force Zeilinger, Gisin, me, and a host of others, to retract our papers on quantum teleportation, quantum non-locality, and all that. I am betting that the world will not be impressed. Our peers will vote whether or not they believe that Bryan has achieved his goal.

The Bell game challenge

Since 2015, Bell-type experiments designed to test local realism have the following format: the format of a so-called “loophole-free Bell test”. There is a fixed sequence of N time-slots, or more precisely, paired time-slots. These are time-slots in two distant labs owned by two scientists Alice and Bob. The time-slots are paired such that a signal sent at the start of one of Alice’s time slots from Alice’s to Bob’s lab, travelling at the speed of light, would only reach Bob’s lab after the end of Bob’s corresponding time-slot; and vice versa. Just after the start of each time-slot, each inserts a binary setting into an experimental device. Something goes on inside that apparatus, and before the time-slot is over, a binary outcome is produced. Each instance with two inputs and two outputs is called a trial.

From Bell’s “Bertlmann’s socks” paper. Inputs are shown below and outputs above the long horizontal box which encloses Alice and Bob’s devices and what is in between

Actually, many experiments require a slightly more elaborate protocol involving a third lab, which you may think of as a source of “pairs of particles”. Charlie’s lab is located somewhere between Alice and Bob’s. Charlie’s device outputs the message “ready” or “not ready” before the end of his time-slot (its length is irrelevant). The message however could only arrive at Alice and Bob’s lab after they have already input their input settings, so could not directly influence their choices. Outcomes get delivered anyway. After the experiment, one looks only at the inputs and outputs of each trial in which Charlie saw the output “ready”. The experiment continues long enough that there are N trials labelled by Charlie’s apparatus as “ready”. From now on, I will forget about this “post-selection” of N trials: the first N which went off to a good start. (The word “post-selection” is a misnomer. It is performed after the whole experiment is complete, but the selection is determined in advance of the introduction of the settings).

Space-time disposition of the time-slots of one trial. The sloping arrows are the boundaries of future light-cones with vertices at the start of Alice, Bob, and Charlie’s time-slots.

The settings are typically chosen to resemble sequences of outcomes of independent fair coin tosses. Sometimes they are generated by physical random number generators using physical noise sources, sometimes they are created using pseudo random number generators (RNGs). Sometimes they are generated on the fly, sometimes created in advance. The idea is that the settings are inputs which come from the outside world, outside the experimental devices, and the outcomes are outputs delivered by the devices to the outside world.

Below is a graphical model specified in the language of the present-day theory of causality based on directed acyclic graphs (DAGs), describing the dependence structure of what is observed in terms of “hidden variables”. There is no assumption that the hidden parts of the structure are classical, nor that they are located in classical space-time. The node “psi” stands for the state of all experimental apparatus in the three labs including transmission lines between them before one trial of the experiment starts, as far as is directly relevant in the causal process leading from experimental inputs to experimental outputs. The node “phi” consists of the state of external devices which generate the settings. The graphical model says that as far as the settings and the outputs are concerned, “phi” and “psi” can be taken to be independent. It says that Bob’s setting is not in the causal pathway to Alice’s outcome.

At the end of the experiment, we have N quadruples of binary bits (a, b, x, y). Here, a and b are the settings and x and y are the outcomes in one of the N “trials”. We can now count the number z of trials in which x = y and neither a or b = 1, together with trials in which xy and both a and b = 1. Those two kinds of trials are both considered trials having the result “success”. The trials remaining have the result “fail”.

Now, let B(p) denote a random variable distributed according to the binomial distribution with parameters N and p. Think of the number of successes z to be the outcome of a random variable Z. According to local realism, and taking p = 0.75, it can be proved that for all z > N p, Prob( Zz ) ≤ Prob( B(p) ≥ z ). According to quantum mechanics, and with q = 0.85, it appears possible to arrange that for all z, Prob( Zz ) = Prob( B(q) ≤ z ). Let’s see what those binomial tail probabilities are with z = 0.80 N, using the statistical programming language “R“.

N <- 1000
p <- 0.75
z <- 0.8 * N
q <- 0.85
pbinom(z, N, p, lower.tail = FALSE)
[1] 8.029329e-05
pbinom(z, N, q, lower.tail = TRUE)
[1] 1.22203e-05

We see that an experiment with N = 1000 time-slots should be plenty to decide whether the experimental results are the result of local realism with a success rate of maximally 75%, or of quantum mechanics with a success rate of 85% (close to the theoretical maximum under quantum mechanics). The winning theory is decided by seeing if the observed success rate is above or below 80%.

Challenge: show by a computer simulation that my claims are wrong. ie, simulate a “loophole-free” Bell experiment with a success rate reliably exceeding 80% when the number of trials is 1000 or more. Rules of the game: you must allow me to supply the “fair coin tosses”. Your computer simulation may use an RNG (called a fixed number of times per trial) to create its own randomness, but it must have “set seed” and “restore seed” facilities in order to make each run exactly reproducible if required. For each n, Alice’s nth output x may depend only on Alice’s nth input a, together with (if desired) all the preceding inputs and outputs. Similarly, Bob’s nth output y may depend only on Bob’s input b, together with (if desired) all the preceding inputs and outputs

Here is a different version of the challenge using the classical Bell-CHSH inequality instead of the more modern martingale inequality. Another version could be specified using the original Bell inequality, for which one would also demand that at equal settings, outcomes are always equal and opposite. After all, the original Bell inequality also assumes perfect anti-correlation, so one must check that that assumption holds.

The whole point of a computer simulation is that an independent judge is unnecessary: your code is written in a widely and freely available language suitable for scientific computing, and anyone with basic computing skills can check that the programming team is not cheating (whether deliberately or inadvertently). The independent judge is the entire scientific community. If you are successful, the simulation will actually be an example of a classical physical system producing what has been thought to be a unique signature of quantum entanglement. You, the lead scientist, will get the Nobel Prize because you and your team (I imagine that you are a theoretician who might need the assistance of a programmer) will have disproved quantum theory by a reproducible and rigorous experiment. No establishment conspiracy will be able to suppress the incredible and earth-shaking news.

Here are my stipulations on the program. I am assuming that it uses a built-in pseudo-random number generator. I assume that it includes “set.seed” and “save.seed” facilities. Otherwise, it is not useful for scientific work and not eligible for my challenge. 

From now on, the phrases “photon pair”, “time slot”, and “trial” are taken to be interchangeable. After all, we are talking about a computer simulation, so the actual evocative natural language words which we use as names for variables and functions are irrelevant.

The program must accept as input a number of trials N, a seed setting the RNG, and two lists of setting labels “1” and “2” of length N. It must generate as output two lists of outcomes +/–1, also of length N. For all n, Alice’s n‘th output depends only on Alice’s n‘th input, as well (if you like) on the inputs and outputs on both sides of earlier trials. And similarly for Bob. I will check this constraint by doing many random spot checks. This is where the rule concerning the RNG comes in.

Let’s take N = 10,000. You will win if the CHSH quantity S exceeds 2.4 in a few repeats with different RNG seeds and varying the lists of inputs. In other words, the violation of the Bell-CHSH inequality is reproducible, and reproducible by independent verifiers. I will supply the lists of inputs after you have published your code. The inputs will be the result of a simulation of independent fair coin tosses using standard scientific computing tools. If you don’t trust me, we can ask a trusted third party to make them for us.

An Italian CSI drama: social media, a broken legal system, and Micky Mouse statistics

Daniela Poggiali, on the day of her final (?) release, 25 October 2021.
Photo: ©Giampiero Corelli

The title of this blog might refer to the very, very famous trials of Amanda Knox in the case of the murder of Meredith Kercher. However, I am writing about a case that is much less known outside of Italy (neither victim nor alleged murderer was a rich American girl). This is the case of Daniela Poggiali, a nurse suspected by the media and accused by prosecution experts of having killed around 90 patients in a two-year killing spree terminated by her arrest in April 2014. She has just been exonerated after a total of three years in prison with a life sentence as well some months of pre-trial detention. This case revolved around statistics of an increased death rate during the shifts of a colourful nurse. I was a scientific expert for the defence, working with an Italian colleague, Julia Mortera (Univ. Rome Tre), later assisted by her colleague Francesco Dotto.

Piet Groeneboom and I worked together on the statistics of the case of Lucia de Berk, see our paper in Chance [Reference]. In fact, it was remarkable that the statistical community in the Netherlands got so involved in that case. A Fokke and Sukke cartoon entitled “Fokke and Sukke know it intuitively” had the exchange “The probability that almost all professors of statistics are in agreement … is obviously very small indeed”.

Fokke and Sukke do not believe that this is a coincidence.

Indeed, it wasn’t. That was one of the high points of my career. Another was Lucia’s final acquittal in 2010, at which the judges took the trouble to say out loud, in public, that the nurses had fought heroically for the lives of their patients; lives squandered, they added, by their doctors’ medical errors.

At that point, I felt we had learnt how to fight miscarriages of justice like that, of which I rapidly became involved in several. So far, however, with rather depressing results. Till a couple of months ago. This story will not have much to do with mathematics. It will have to do with simple descriptive statistics, and I will also mention the phrases “p-value” and “Bayes’ rule” a few times. One of the skills of a professional statistician is the abstraction of messy real-world problems involving chance and data. It’s not for everybody. Many mathematical statisticians prefer to prove theorems, just like any other mathematician. In fact, I often do prefer to do that myself, but I like more being able to alternate between the two modes of activity, and I do like sticking my nose into other people’s business, and learning about what goes on in, for instance, law, medicine, or anything else. Each of the two activity modes is a nice therapy for the frustrations which inevitably come with the other.

The Daniela Poggiali case began, for me, soon after the 8th of April, 2014, when it was first reported in international news media. A nurse at the Umberto I hospital in the small town of Lugo, not far from Ravenna, had been arrested and was being investigated for serial murder. She had had photos of herself taken laughing, close to the body of a deceased patient, and these “selfies” were soon plastered over the front pages of tabloid media. Pretty soon, they arrived in The Guardian and The New York Times. The newspapers sometimes suggested she had killed 93 patients, sometimes 31, sometimes it was other large numbers. It was suspected that she had used Potassium Chloride on some of those patients. An ideal murder weapon for a killer nurse since easily available in a hospital, easy to give to a patient who is already hooked up to an IV drip, kills rapidly (cardiac arrest – it is used in America for executions), and after a short time hard to detect. After death, it redistributes itself throughout the body where it becomes indistinguishable from a normal concentration of Potassium.

An IV drip. ©Stefan Schweihofer users/StefanSchweihofer

Many features of the case reminded me strongly of the case of Lucia de Berk in the Netherlands. In fact, it seemed very fishy indeed. I found the name of Daniela’s lawyer in the online Italian newspapers, Google found me an email address, and I sent a message offering support on the statistics of the case. I also got an Italian statistician colleague and good friend, Julia Mortera, interested. Daniela’s lawyer was grateful for our offer of help. The case largely hinged on a statistical analysis of the coincidence between deaths on a hospital ward and Daniela’s shifts there. We were emailed pdfs of scanned pages of a faxed report of around 50 pages containing results of statistical analyses of times of shifts of all the nurses working on the ward, and times of admission and discharge (or death) of all patients, during much of the period 2012 – 2014. There were a further 50 pages (also scanned and faxed) of appendices containing print-outs of the raw data submitted by hospital administrators to police investigators. Two huge messy Excel spreadsheets.

The authors of the report were Prof. Franco Tagliaro (Univ. Verona) and Prof. Rocco Micciolo (Univ. Trento). The two are respectively a pathologist/toxicologist and an epidemiologist. The epidemiologist Micciolo is a professor in a social science department, and member of an interfaculty collaboration for the health sciences. We found out that the senior and more influential author Tagliaro had published many papers on toxicology in the forensic science literature, usually based on empirical studies using data sets provided by forensic institutes. Occasionally, his friend Micciolo turned up in the list of authors and had supplied statistical analyses. Micciolo describes himself as a biostatistician. He has written Italian language textbooks on exploratory data-analysis with the statistical package “R” and is frequently the statistician-coauthor of papers written by scientists from his university in many different fields including medicine and psychology. They both had decent H-indices, their publications were in decent journals, their work was mainstream, useful, “normal science”. They were not amateurs. Or were they?

Daniela Poggiali worked on a very large ward with very many very old patients, many suffering terminal illnesses. Ages ranged from 50 up to 105, mostly around ninety. The ward had about 60 beds and was usually quite fully occupied. Patients tended to stay one to two weeks in the hospital, admitted to the hospital for reasons of acute illness. There was on average one death every day; some days none, some days up to four. Most patients were discharged after several weeks in the hospital to go home or to a nursing home. It was an ordinary “medium care” nursing department (i.e., not an Intensive Care unit).

The long building at the top: “Block B” of Umberto I hospital, Lugo

Some very simple statistics showed that the death rate on days when Poggiali worked was much higher than on days when she did not work. A more refined analysis compared the rate of deaths during the hours she worked with the rate of deaths during the hours she was not at work. Again, her presence “caused” a huge excess, statistically highly significant. A yet more refined analysis compared the rate of deaths while she was at work in the sectors where she was working with the rate in the opposite sectors. What does this mean? The ward was large and spread over two long wings of one floor of a large building, “Blocco B”, probably built in the sixties.

Sector B of “Blocco B” (Google Streetview). Seen from the North.

Between the two wings were central “supporting facilities” and also the main stairwell. Each wing consisted of many rooms (each room with several beds), with one long corridor through the whole building, see the floor plan below. Sector A and B rooms were in one wing, first A and then B as you you went down the corridor from the central part of the floor. Sector C and Sector D rooms were in the other wing, opposite to one another on each side of the corridor. Each nurse was usually detailed in her shifts to one sector, or occasionally to the two sectors in one wing. While working in one sector, a nurse could theoretically easily slip into a room in the adjacent sector. Anyway, the nurses often helped one another, so they often could be found in the “wrong sector”, but not often in the “wrong wing”.

Tagliaro and Micciolo (in the sequel: TM) went on to look at the death rates while Daniela was at work in different periods. They noticed that it was higher in 2013 than in 2012, even higher in the first quarter of 2014, then – after Daniela had been fired – it was much, much less. They conjectured that she was killing more and more patients as time went by, till the killing stopped dead on her suspension and arrest

TM certainly knew that, in theory, other factors might be the cause of an increased death rate on Poggiali’s shifts. They were proud of their innovative approach of relating each death that occurred while Daniela was at work to whether it occurred in Daniela’s wing or in the other. They wrote that in this way they had controlled for confounders, taking each death to provide its own “control”. (Similarly, in the case of Lucia de B., statistician Henk Elffers had come up with an innovative approach. In principle, it was not a bad idea, though all it showed was that nurses are different). TM did not control for any other confounding factors at all. In their explanation of their findings to the court, they repeatedly stated categorically that the association they had found must be causal, and Daniela’s presence was the cause. Add to this that their clumsy explanation of p-values might have misled lawyers, journalists and the public. In such a case, a p-value is the probability of what you see (more precisely, of at least what you see), assuming pure chance. That is not the same as the probability that pure chance was the cause of what you see – the fallacy of the transposed conditional, also known as “the prosecutor’s fallacy”.

Exercise to the reader: when is this fallacy not a fallacy? Hint: revise your knowledge of Bayes’ rule: posterior odds equals prior odds time likelihood ratio.

Bayes rule in odds form. p and d stand for “prosecution” and “defence” respectively, H stands for “Hypothesis”

We asked Tagliaro and Micciolo for the original Excel spreadsheets and for the “R” scripts they had used to process the data. They declined to give them to us, saying this would not be proper since they were confidential. We asked Daniela’s lawyer to ask the court to ask for those computer files on our behalf. The court declined to satisfy our request. We were finally sent just the Excel files by the hospital administration, a week before we were called to give evidence. Fortunately, with a combination of OCR and a lot of painstaking handwork, a wealthy friend of Daniela’s lawyer had already managed to help us get the data files reconstructed. We performed a lot of analyses with the help of a succession of students because extracting what we needed from those spreadsheets was an extraordinarily challenging issue. One kept finding anomalies that had to be fixed in one way or another. Even when we had “clean” spreadsheets, it still was a mess.

Next, we started looking for confounding factors that might explain the difference between Daniela and her colleagues, which certainly was striking and real. But was it perhaps entirely innocent?

Minute, hour, weekday, month of deaths

First of all, simple histograms showed that death rates on that ward varied strongly by month, with big peaks in June and again in December. (January is not high: elderly people stay home in January and keep themselves warm and safe). That is what one should expect. The humid heat and air pollution in the summer; or the damp and cold and the air pollution in the winter, exacerbated by winter flu epidemics. Perhaps Daniela worked more at bad times than at good times? No. It was clear that sectors A+B were different from C+D. Death rates were different, but also the number of beds in each wing was different. Perhaps Daniela was allocated more often to “the more difficult” sections? It was not so clear. Tagliaro and Micciolo computed death rates for the whole ward, or for each wing of the ward, but never took account of the number of patients in each wing nor of the severity of their illnesses.

Most interesting of all was what we found when we looked at the hour of the time of death of patients who died, and the minute of the time of death of patients who died. Patients tended to die at times which were whole hours, “half past” was also quite popular. There was however also a huge peak of deaths between midnight and five minutes past midnight! There were fewer deaths in a couple of hours soon after lunchtime. There were large peaks of deaths around the time of handover between shifts: 7:00 in the morning, 2:00 in the afternoon, 9:00 in the evening. The death rate is higher in the morning than in the afternoon, and higher in the afternoon than at night. When you’re dying (but not in intensive care, when it is very difficult to die at all) you do not die in your sleep at night. You die in the early morning as your vital organs start waking up for the day. Now, also not surprisingly, the number of nurses on a ward is largest in the morning when there is a huge amount of work to do; it’s much less in the afternoon and evening, and it’s even less at night. This means that a full-time nurse typically spends more time in the hospital during morning shifts than during afternoon shifts, and more time during afternoon shifts than during night shifts. The death rate shows the same pattern. Therefore, for every typical full-time nurse, the death rate while they are at work tends to be higher than when they are not at work!

Nurses aren’t authorized to officially register times of death. Only a doctor is authorized to do that. He or she is supposed to write down the time at which they have determined the patient is no longer alive. It seems that they often round that time to whole or half hours. The peak just after midnight is hard to explain. The date of death has enormous financial and legal consequences. The peak suggests that those deaths may have occurred anywhere in a huge time window. Whether or not doctors come to the wards on the dot at midnight and fill in forms for any patients who have died in the few hours before is hard to believe

What is now clear is that it is mainly around the hand-over between shifts that deaths get “processed”. Quite a few times of death are so hard to know that they are shunted to five minutes past midnight; many others are located in the hand-over period but might well have occurred earlier.

Some nurses tend to work longer shifts than others. Some conscientiously clock in as early as they are allowed, before their shift starts, and clock out as late as they can after their shift ends. Daniela was such a nurse. Her shifts were indeed statistically significantly longer than those of any of her colleagues. She very often stayed on duty several hours after the official end of the official ten-minute overlap between shifts. There was often a lot to do – one can imagine often involving taking care of the recently deceased. Not the nicest part of the job. Daniela was well known to be a rather conscientious and very hard worker, with a fiery temper, known to play pranks on colleagues or to loudly disagree with doctors for whom she had a healthy disrespect.

Incidentally, the rate of admissions to Umberto I hospital tumbled down after the news broke of a serial killer – and the news broke the day after the last day the serial killer was at work, together with the publication of the lurid “selfie”. The rate of deaths was slowly increasing over the two years up to then, as was in fact also the rate of admissions and the occupancy of the ward. A hospital getting slowly more stressed? Taking on more work?

If one finds a correlation between X and Y, it is a sound principle to suppose that it has a causal explanation. Maybe X causes Y, maybe Y causes X, … and maybe W causes both X and Y, or maybe X and Y both cause Z and there has been a selection on the basis of Z. In the case of Lucia de B., her association between inexplicable incidents and her presence on the ward was caused by her, since the definition of “unexpected and inexplicable incident” included her being there. She was already known to be a weird person, and it was already clear that there were more deaths than usual on her ward. The actual reason for that was a change of hospital policy, moving patients faster from intensive care to medium care so that they could die at home, rather than in the hospital. If she was not present, then the medical experts always could come up with an explanation for why that death, though perhaps a bit surprising at that moment, was expected to occur soon anyway. But if Lucia was there then they were inclined to believe in foul play because after all there were so many incidents in her shifts.

Julia and I are certain that the difference between Daniela’s death rates and those of other nurses is to a huge extent explainable by the anomalies in the data which we had discovered and by her long working hours.

Some residual difference could be due to the fact that a conscientious nurse actually notices when patients have died, while a lazy nurse keeps a low profile and leaves it to her colleagues to notice, at hand-over. We have been busy fitting sophisticated regression models to the data but this work will be reported in a specialist journal. It does not tell us more than what I have already said. Daniela is different from the other nurses. All the nurses are different. She is extreme in a number of ways: most hours worked, longest shifts worked. We have no idea how the hospital allocated nurses to sectors and patients to sectors. We probably won’t get to know the answer to that, ever. The medical world does not put out its dirty washing for everyone to see.

We wrote a report and gave evidence in person in Ravenna in early 2015. I did not have time to see the wonderful Byzantine mosaics though I was treated to some wonderful meals. I think my department paid for my air ticket. Julia and I worked “pro deo“. In our opinion, we totally shredded the statistical work of Tagliaro and Micciolo. The court however did not agree. “The statistical experts for the defence only offered a theoretical discourse while those of the prosecution had scientifically established hard facts”. In retrospect, we should have used stronger language in our report. Tagliaro and Micciolo stated that they had definitively proven that Daniela’s presence caused 90 or so extra deaths. They stated that this number could definitely not be explained as a chance fluctuation. They stated that, of course, the statistics did not prove that she had deliberately murdered those patients. We, on the other hand, had used careful scientific language. One begins to understand how it is that experts like Tagliaro and Micciolo are in such high demand by public prosecutors.

There was also toxicological evidence concerning one of the patients and involving K+ Cl–, but we were not involved in that. There was also the “selfie”, there was character evidence. There were allegations of thefts of patients’ personal jewellery. It all added up. Daniela was convicted of just one murder. The statistical evidence provided her motive: she just loved killing people, especially people she didn’t like. No doubt, a forensic psychologist also explained how her personality fitted so well to the actions she was alleged to have done.

Rapidly, the public prosecution started another case based largely on the same or similar evidence but now concerning another patient, with whom Daniela had had a shouting match, five years earlier. In fact, this activity was probably triggered by families of other patients starting civil cases against the hospital. It would also clearly be in the interest of the hospital authorities to get new criminal proceedings against Daniela started. However, Daniela’s lawyers appealed against her first conviction. It was successfully overturned. But then the court of cassation overturned the acquittal. Meantime, the second case led to a conviction, then acquittal on appeal, then cassation. All this time Daniela was in jail. Cassations of cassations meant that Daniela had to be tried again, by yet another appeal court, for the two alleged murders. Julia and I and her young colleague Francesco Dotto got to work again, improving our arguments and our graphics and our formulations of our findings.

At some point, triggered by some discussions with the defence experts on toxicology and pathology, Julia took a glance at Tagliaro’s quite separate report on the toxicological evidence. This led to a breakthrough, as I will now explain.

Tagliaro knew the post-mortem “vitreous humour” potassium concentration of the last patient, a woman who had died on Daniela’s last day. That death had somehow surprised the hospital doctors, or rather, as it later transpired, it didn’t surprise them at all: they had already for three months been looking at the death rates while Daniela was on duty and essentially building up a dossier against her, just waiting for a suitable “last straw”! Moreover, they already had their minds on K+ Cl-, since some had gone missing and then turned up in the wrong place. Finally, Daniela had complained to her colleagues about the really irritating behaviour of that last patient, 73-year-old Rosa Calderoni.

“Vitreous humour” is the transparent, colourless, gelatinous mass that fills your eyeballs. While you are alive, it has a relatively low concentration of potassium. After death, cell walls break down, and potassium concentration throughout the body equalises. Tagliaro had published papers in which he studied the hourly rate of increase in the concentration, using measurements on the bodies of persons who had died at a known time of causes unrelated to potassium chloride poisoning. He even had some fresh corpses on which he could make repeated measurements. His motivation was to use this concentration as a tool to determine the PMI (post-mortem interval) in cases when we have a body and a post-mortem examination but no time of death. In one paper (without Micciolo’s aid) he did a regression analysis, plotting a straight line through a cloud of points (y = concentration, x = time since death). He had about 60 observations, mostly men, mostly rather young. In a second paper, now with Micciolo, he fitted a parabola and moreover noted that there was an effect of age and of sex. The authors also observed the huge variation around that fitted straight line and concluded that the method was not reliable enough for use in determining the PMI. But this did not deter Tagliaro, when writing his toxicological report on Rosa Calderoni! He knew the potassium concentration at the time of post-mortem, he knew exactly when she died, he had a number for the natural increase per hour after death from his first, linear, regression model. With this, he calculated the concentration at death. Lo and behold: it was a concentration which would have been fatal. He had proved that she had died of potassium chloride poisoning.

Prediction of vitreous humour K+ concentration 56 hours after death without K+ poisoning

Julia and Francesco used the model of the second paper and found out that if you would assume a normal concentration at the time of death, and take account of the variability of the measurements and of the uncertainty in the value of the slope, then the concentration observed at the time of post-mortem was maybe above average, but not surprisingly large at all.

Daniela Poggiali became a free woman. I wish her a big compensation and a long and happy life. She’s quite a character.

Aside from the “couleur locale” of an Italian case, this case had incredibly much similarity with the case of Lucia de Berk. It has many similarities with quite a few other contested serial killer nurse cases, in various countries. According to a NetFlix series, in which a whole episode is devoted to Daniela, these horrific cases occur all the time. They are studied by criminologists and forensic psychologists, who have compiled a list of “red flags” intended to help warn hospital authorities. The scientific term here is “health care serial killer”, or HCSK. One of the HCSK red flags is that you have psychiatric problems. Another is that your colleagues think you are really weird. Especially when your colleagues call you an angel of death, that’s a major red flag. The list goes on. These lists are developed in scientific publications in important mainstream journals, and the results are presented in textbooks used in university criminology teaching programs. Of course, you can only scientifically study convicted HCSKs. Your sources of data are newspaper reports, judges’ summings up, the prosecution’s final summary of the case. It is clear that these red flags are the things that convince judges and jurors to deliver a guilty verdict. These are the features that will first make you a suspect, which police investigators will look for, and which will convince the court and the public of your guilt. Amusingly, one of the side effects of the case of Lucia de Berk was contributing a number of entries to this list, for instance, the Stephen King horror murder novels she had at home which were even alleged to have been stolen from the library. Her conviction for the theft of several items still stands. As does Daniela’s: this means that Daniela is not eligible for compensation. In neither case was there any real proof of thefts. Amusingly, one of the side effects of the case of Lucia de Berk was contributing a number of entries to this list. Embarrassingly, her case had to be removed from the collections of known cases after 2011, and the criminologists and forensic psychologists also now mention that statistical evidence of many deaths during the shifts of a nurse is not actually a very good red flag. They have learnt something, too.

Interesting is also the incidence of these cases: less than 1 in a million nurses killing multiple patients per year, according to these researchers. These are researchers who have the phenomenon of HCSKs as their life work, giving them opportunities to write lurid books on serial murder, appear in TV panels and TV documentaries explaining the terrible psychology of these modern-day witches, and to take the stand as prosecution witnesses. Now, that “base rate” is actually rather important, even if only known very roughly. It means that such crimes are very, very unusual. In the Netherlands, one might expect a handful of cases per century; maybe on average 100 deaths in a century. There are actually only about 100 murders altogether in the Netherlands per year. On the other hand, more than 1000 deaths every year are due to medical errors. That means that evidence against a nurse suspected of being a HCSK would be very strong indeed before it should convince a rational person that they have a new HCSK on their hands. Lawyers, judges, journalists and the public are unfortunately perhaps not rational persons. They are certainly not good with probability, and not good with Bayes’ rule. (It is not allowed to be used in a UK criminal court, because judges have ruled that jurors cannot possibly understand it).

I am still working on one UK case, Ben Geen. I believe it is yet another example of a typical innocent HCSK scare in a failing hospital leading to a typical unsafe conviction based largely on the usual red flags and a bit of bad luck. At least, I see no reason whatsoever to suppose that Ben Geen was guilty of the crimes for which he is sitting out a life sentence. Meanwhile, a new case is starting up in the UK: Lucy (!) Letby. I sincerely hope not to be involved with that one.

Time for a new generation of nosy statisticians to do some hard work.


Norman Fenton, Richard D. Gill, David Lagnado, and Martin Neil. Statistical issues in serial killer nurse cases.

Alexander R.W. Forrest. Nurses who systematically harm their patients. Medical Law International, 1(4): 411–421, 1995.

Richard D. Gill, Piet Groeneboom, and Peter de Jong. Elementary statistics on trial—the case of Lucia de Berk. CHANCE, 31(4):9–15, 2018.

Covadonga Palacio, Rossella Gottardo, Vito Cirielli, Giacomo Musile, Yvane Agard, Federica Bortolotti, and Franco Tagliaro. Simultaneous analysis of potassium and ammonium ions in the vitreous humour by capillary electrophoresis and their integrated use to infer the post mortem interval (PMI). Medicine, Science and the Law, 61(1 suppl):96–104, 2021.

Nicola Pigaiani, Anna Bertaso, Elio Franco De Palo,Federica Bortolotti, and Franco Tagliaro. Vitreous humor endogenous compounds analysis for post-mortem forensic investigation. Forensic science international, 310:110235, 2020.

Elizabeth Yardley and David Wilson. In search of the ‘angels of death’: Conceptualising the contemporary nurse healthcare serial killer. Journal of Investigative Psychology and Offender Profiling, 13(1):39–55, 2016. 1002/jip.1434

Francesco Dotto, Richard D. Gill and Julia Mortera (2022) Statistical Analyses in the case of an Italian nurse accused of murdering patients. Submitted to “Law, Probability, Risk” (Oxford University Press), accepted for publication subject to minor revision; preprint:

Was the AD Herring Test about more than the herring?

“Is the AD Herring Test about more than the herring?” – opinion of prof.dr. R.D. Gill

I was asked for my opinion as a statistician and scientist in a case between the AD and Dr. Ben Vollaard (economist, Tilburg University). My opinion was asked by Mr. O.G. Trojan (Bird & Bird), which represents AD in this case. These are two articles by Mr Vollaard with a statistical analysis of data on the AD herring test of July and November 2017. The articles have not been published in scientific journals (therefore have not undergone peer review), but have been made available on the internet and publicized by press releases from Tilburg University, which has led to further attention in the media.

Dr. Vollaard’s work focuses on two suspicions regarding the AD herring test: first, that it would favour AD fishmongers in the Rotterdam area; and second, that it would favour AD fishmongers that source their herring from a particular wholesaler, Atlantic. This is related to the fact that a member of the AD herring test panel also has a business relationship with Atlantic: he teaches Atlantic herring cutting and other aspects of the preparation (and storage) of herring. These suspicions have surfaced in the media before. You may have noticed that fish shops from the Rotterdam area, and fish shops that are customers of Atlantic, often appear in the “top ten” of different years of the herring test. But that may just be right, because of the quality of the herring they serve. It cannot be concluded from this that the panel is biased.

The questions I would like to answer here are the following: does Vollaard’s research provide any scientific support for the complaints about the herring test? Is Vollaard’s own summary of his findings justified?

Vollaard’s first investigation

Vollaard works by estimating and interpreting a regression model. He tries to predict the test score from measured characteristics of the herring and from partial judgments of the panel. His summary of the results is: the panel prefers “herring of 80 grams with a temperature below 7 degrees Celsius, a fat percentage above 14 percent, a price of around € 2.50, fresh from the knife, a good microbiological condition, slightly aged, very well cleaned ”.

Note, “taste” is not on the list of measured characteristics. And by the way, as far as temperature is concerned, 7 degrees is the legal maximum temperature for the sale of herring.

However, it is not possible to explain the difference between the Rotterdam area and beyond by using these factors. Vollaard concludes that “sales outlets for herring in Rotterdam and surroundings receive a higher score in the AD herring test than can be explained by the quality of the herring served”. Is that a correct conclusion?

In my opinion, Vollaard’s conclusion is unjustified. There are four reasons why the conclusion is incorrect.

First, the AD herring test is primarily a taste test and the taste of a herring, as judged by the panel of three regular subjects, is undoubtedly not fully predictable using the characteristics that have been measured. The model also does not predict the final grade exactly. Apparently there is some correlation between factors such as price and weight with taste, or more generally with quality. A reasonably good prediction can be made with the criteria used by Vollaard together, but a “residual term” remains, which stands for differences in taste between herring from fishmongers that are otherwise the same as regards the characteristics that have been measured. Vollaard does not tell us how large that residual term is, and does not say much about it.

Second, the way in which the characteristics are related to the taste (linear additive), according to Vollaard, does not have to be valid at all. I am referring to the specific mathematical form of the prediction formula: final mark = a. weight +… + remaining term. Vollaard has assumed the simplest possible relationship, with as few unknown parameters as possible (a, b,…). Here he follows tradition and opts for simplicity and convenience. His entire analysis is only valid with the proviso that this model specification is correct. I find no substantiation for this assumption in his articles.

Third, regional differences in the quality and taste of herring are quite possible, but these differences cannot be explained by differences in the measured characteristics of the herring. There can be large regional differences between consumer tastes. The taste of the permanent panel members (two herring masters and a journalist) does not have to be everyone’s taste. Proximity to important ports of supply could promote quality.

Fourth, the fish shops studied are not a random sample. A fish trader that is highly rated in one year is extra motivated to participate again in subsequent years, and vice versa. Over the years, the composition of the collection of participants has evolved in a way that may depend on the region: the participants from Rotterdam and the surrounding area have pre-selected themselves more on quality. They are also more familiar with the panel’s preferences.

Vollaard’s conclusion is therefore untenable. The correct conclusion is that the taste experience of the panel cannot be fully explained (in the way that Vollaard assumes) from the available list of measured quality characteristics. Moreover, the participating fishmongers from the Rotterdam region are perhaps a more select group (preselected for quality) than the other participants.

So it may well be that the herring outlets in Rotterdam and surroundings that participate in the AD herring test get a higher score in the AD herring test than the participating outlets from outside that region, because their herring tastes better (and in general, is of better quality).

Vollaard’s second investigation

The second article goes a lot further. Vollaard tries to compare fish shops that buy their herring from wholesaler Atlantic with the other fish shops. He thinks that the Atlantic customers score higher on average than the others. The difference is also predicted by the model, so Vollaard can try, starting from the model, to attribute the difference to the measured characteristics (regarding the question “region”, the difference could not be explained by the model). It turns out that maturation and cleaning account for half of the difference; the rest of the difference is neatly explained by the other variables.

However, according to the AD, Vollaard has made mistakes in the classification of fishmongers as an Atlantic customer. An Atlantic customer whose test score was 0.5 was wrongly not included. The difference in mean score is 2.4 instead of 3.6. The second article therefore needs to be completely revised. All numbers in Table 1 are wrong. It is impossible to say whether the same analysis will lead to the same conclusions!

Still, I will discuss Vollaard’s further analysis to show that unscientific reasoning is also used here. We had come to the point where Vollaard observes that the difference between Atlantic customers and others, according to his model, is mainly due to the fact that they score better in the measured characteristics “ripening” and “cleaning”. Suddenly, these characteristics maturation and cleansing are called “subjective”: and Vollaard’s explanation of the difference is conscious or unconscious panel bias.

Apparently, the fact that these characteristics would be subjective is evidence to Vollaard that the panel is biased. Vollaard uses the subjective nature of the factors in question to make his explanation of the correlations found, namely panel bias, plausible. Or expressed in other words: according to Vollaard there is a possibility of cheating and so there must have been cheating.

This is pure speculation. Vollaard tries to substantiate his speculation by looking at the distribution over the classes in “maturation” and “cleaning”. For example, for maturation: the distribution between “average” / “strong” / “spoiled” is 100/0/0 percent for Atlantic, 60/35/5 for non-Atlantic; for cleaning: the split between good / very good is 0/100 for Atlantic, 50/50 for non-Atlantic. These differences are so great, according to Vollaard, that there must have been cheating. (By the way, Atlantic has only 15 fish shops, non-Atlantic nine times as many.)

Vollaard seems to think that “maturation” is so subjective that the panel can shift indefinitely between the “average”, “strong” and “spoiled” classes to favour Atlantic fishing traders. However, it is not obvious that the classifications “ripening” and “cleaning” are as subjective as Vollaard wants to make it appear. In any case, this is a serious charge. Vollaard allows himself the proposition that the panel members have misused the subjective factors (consciously or unconsciously) to benefit Atlantic customers. They would have consistently awarded Atlantic customers higher valuations than can be justified on the basis of their research.

But if the Atlantic customers are rightly evaluated as very high quality on the basis of fat content, weight, microbiology, fresh-from-the-knife – which objective factors, according to Vollaard, are responsible for the other half of the difference in the average grade – why should they not rightly score high on ripening and cleaning?

Vollaard notes that the ratings of “maturation” and “microbiological status” are inconsistent while, again, according to him, the first is a subjective judgment of the panel, the second an objective measurement of a laboratory. The AD noted that maturation is related to oil and fat becoming rancid, which is a process accelerated by oxygen and heat; while the presence of certain harmful microorganisms is caused by poor hygiene. We therefore do not expect any similarities between these different types of spoilage.

Vollaard’s arguments seem to be occasional arguments intended to confirm a previously taken position; statistical or economic science does not play a role here. In any case, the second article should be thoroughly revised in connection with the misclassification of Atlantic customers. The resulting adaptation of Table 1 could shed a completely different light on the difference between Atlantic customers and others.

My conclusion is that the scientific content of the two articles is low, and the second article is seriously contaminated by the use of incorrect data. The second article concludes with the words “These points are not direct evidence of favouring fish traders with the concerned supplier in the AD herring test, but the test team has all appearances against it based on this study.” This conclusion is based on incorrect data, on a possibly wrong model, and on speculation on topics outside of statistics or economics. The author himself has created appearances and then tried to substantiate it, but his reasoning is weak or even erroneous – there is no substantiation, only the appearance remains.

At the beginning I asked the following two questions: does Vollaard’s research give any scientific support to the complaints about the herring test? Is Vollaard’s own summary of his findings justified? My conclusions are that the research conducted does not contribute much to the discussions surrounding the herring test and that the conclusions drawn are erroneous and misleading.


Detail points

Vollaard uses *, **, *** for significance at 10% level, 5% level, 1% level. This is a devaluation of the traditional 5%, 1%, 0.1%. Too much risk of false positives gives an overly exaggerated picture of the reliability of the results.

I find it very inappropriate to include “in the top 10” as an explanatory variable in the second article. Thus, a high score is used to explain a high score. I suspect that the second visit to top 10 stores only leads to minor adjustment of the test figure (eg 0.1 point to break a tie) so no need for this variable in the forecasting model.

Why is “price” omitted as an explanatory variable in the second article? In the first, “price” had a significant effect. (I think including “top ten” is responsible for the loss of significance of some variables, such as “region” and possibly “price”).

I have the impression that some numbers in the column “Difference between outlets with and without Atlantic as supplier” of Table 1, second article, are incorrect. Is it “Atlantic customers minus non-Atlantic customers” or, conversely, “non-Atlantic customers minus Atlantic customers”?

It is common in a regression analysis to perform extensive control of the model assumptions by means of residual analysis (“regression diagnostics”). No trace of this in the articles.

Regression analysis of data from a cross-section of companies over two years, so many fish shops occur twice. Correlation between the remainder terms over the two years?

What is the standard deviation of the remainder term? This is a much more informative feature of the model’s explanatory / predictive value than the R-square.

Richard Gill

April 5, 2018

Condemned by statisticians?

A Bayesian analysis of the case of Lucia de B.

de Vos, A. F. (2004).

Door statistici veroordeeld? Nederlands Juristenblad, 13, 686-688.

Here, the result of Google-translate by RD Gill; with some “hindsight comments” by him added in square brackets and marked “RDG”.

Would having posterior thoughts
Not be offending the gods?
Only the dinosaur
Had them before
Recall its fate! Revise your odds!
(made for a limerick competition at a Bayesian congress).

The following article was the basis for two full-page articles on Saturday, March 13, 2004 in the science supplement of the NRC (with unfortunately disturbing typos in the ultimate calculation) and in “the Forum” of Trouw (with the expected announcement on the front page that I claimed that the chance that Lucia de B. was wrongly convicted was 80%, which is not the case)

Condemned by statisticians?
Aart F. de Vos

Lucia de Berk [Aart calls her “Lucy” in his article. That’s a bit condescending – RDG] has been sentenced to life imprisonment. Statistical arguments played a role in that, although the influence of this in the media was overestimated. Many people died while she was on duty. Pure chance? The consulted statistician, Henk Elffers, repeated his earlier statement during the current appeal that the probability was 1 in 342 million. I quote from the article “Statisticians do not believe in coincidence” from the Haags Courant of January 30th: “The probability that nine fatal incidents took place in the JKZ during the shifts of the accused by pure chance is nil. (…) It wasn’t chance. I don’t know what it was. As a statistician, I can’t say anything about it. Deciding the cause is up to you”. The rest of the article showed that the judge had great difficulty with this answer, and did not manage to resolve those difficulties.

Many witnesses were then heard who talked about circumstances, plausibility, oddities, improbabilities and undeniably strong associations. The court has to combine all of this and arrive at a wise final judgment. A heavy task, certainly given the legal conceptual system that includes very many elements that have to do with probabilities but has to make do without quantification and without probability theory when combining them.

The crucial question is of course: how likely is it that Lucia de Berk committed murders? Most laypeople will think that Elffers answered that question and that it is practically certain.

This is a misunderstanding. Elffers did not answer that question. Elffers is a classical statistician, and classical statisticians do not make statements about what is actually going on, but only about how unlikely things are if nothing special is going on at all. However, there is another branch of statistics: the Bayesian. I belong to that other camp. And I’ve also been doing calculations. With the following bewildering result:

If the information that Elffers used to reach his 1 in 342 million were the only information on which Lucia de Berk was convicted, I think that, based on a fairly superficial analysis, there would be about an 80% chance of the conviction being wrong.

This article is about this great contrast. It is not an indictment of Elffers, who was extremely modest in the court when interpreting his outcome, nor a plea to acquit Lucia de Berk, because the court uses mainly different arguments, albeit without unequivocal statements of probability, while there is nothing which is absolutely certain. It is a plea to seriously study Bayesian statistics in the Netherlands, and this applies to both mathematicians and lawyers. [As we later discovered, many medical experts’ conclusions that certain deaths were unnatural was caused by their knowledge that Lucia had been present at an impossibly huge number of deaths – RDG]

There is some similarity to the Sally Clark case, which was sentenced to life imprisonment in 1999 in England because two of her sons died shortly after birth. A wonderful analysis can be found in the September 2002 issue of “living mathematics”, an internet magazine (

An expert (not a statistician, but a doctor) explained that the chance that such a thing happened “just by chance” in the given circumstances was 1 in 73 million. I quote: “probably the most infamous statistical statement ever made in a British courtroom (…) wrong, irrelevant, biased and totally misleading.” The expert’s statement is completely torn to shreds in said article. Which includes mention of a Bayesian analysis. And a calculation that the probability that she was wrongly convicted was greater than 2/3. In the case of Sally Clark, the expert’s statement was completely wrong on all counts, causing half the nation to fall over him, and Sally Clark, though only after four years, was released. However, the case of Lucia de Berk is infinitely more complicated. Elffers’ statement is, I will argue, not wrong, but it is misleading, and the Netherlands has no jurisprudence, but judgments, and even though they are not directly based on extensive knowledge of probability theory, they are much more reasoned. That does not alter the fact that there is a common element in the Lucy de Berk and Sally Clark cases. [Actually, Elffers’ statement was wrong in its own terms. Had he used the standard and correct way to combine p-values from three separate samples, he would have ended up with a p-value of about 1/1000. Had he verified the data given him by the hospital, it would have been larger still. Had he taken account of heterogeneity between nurses and uncertainty in various estimates, both of which classical statisticians also know how to do too, larger still – RDG]

Bayesian statistics

My calculations are therefore based on alternative statistics, the Bayesian, named after Thomas Bayes, the first to write about “inverse probabilities”. That was in 1763. His discovery did not become really important [in statistics] until after 1960, mainly through the work of Leonard Savage, who proved that when you make decisions under uncertainty you cannot ignore the question of what chances the possible states of truth have (in our case the states “guilty” and “not guilty”). Thomas Bayes taught us how you can learn about that kind of probability from data. Scholars agree on the form of those calculations, which is pure probability theory. However, there is one problem: you have to think about what probabilities you would have given to the possible states before you had seen your data (the prior). And often these are subjective probabilities. And if you have little data, the impact of those subjective probabilities on your final judgment is large. A reason for many classical statisticians to oppose this approach. Certainly in the Netherlands, where statistics is mainly practised by mathematicians, people who are trained to solve problems without wondering what they have to do with reality. After a fanatical struggle over the foundations of statistics for decades (see my piece “the religious war of statisticians” at the parties have come closer together. With one exception: the classical hypothesis test (or significance test). Bayesians have fundamental objections to classical hypothesis tests. And Elffers’ statement takes the form of a classical hypothesis test. This is where the foundations debate focuses.

The Lucy Clog case

Following Elffers, who explained his method of calculation in the Nederlands Juristenblad on the basis of a fictional case “Klompsma” which I have also worked through (arriving at totally different conclusions), I want to talk about the fictional case Lucy Clog [“Klomp” is the Dutch word for “clog”; the suffix “-sma” indicates a person from the province of Groningen; this is all rather insulting – RDG]. Lucy Clog is a nurse who has experienced 11 deaths in a period in which on average only one case occurs, but where no further concrete evidence against her can be found. In this case too, Elffers would report an extremely small chance of coincidence in court, about 1 in 100 million [I think that de Vos is thinking of the Poisson(1) chance of at least 11 events. If so, it is actually a factor 10 smaller. Perhaps he should change “11 deaths” into “10 deaths” – RDG]. This is the case where I claim that a guilty conviction, given the information so far together with my assessment of the context, has a chance of about 80% of being wrong.

This requires some calculations. Some of them are complicated, but the most important aspect is not too difficult, although it appears that many people struggle with it. A simple example may make this key point clear.

You are at a party and a stranger starts telling you a whole story about the chance that Lucia de Berk is guilty, and embarks joyfully on complex arithmetical calculations. What do you think: is this a lawyer or a mathematician? If you say a mathematician because lawyers are usually unable to do mathematics, then you fall into a classical trap. You think: a mathematician is good at calculations, while the chance that a lawyer is good at calculations is 10%, so it must be a mathematician. What you forget is that there are 100 times more lawyers than mathematicians. Even if only 10% of lawyers could do this calculating stuff, there would still be 10 times as many lawyers as mathematicians who could do it. So, under these assumptions, the probability is 10/11 that it is a lawyer. To which I must add that (I think) 75% of mathematicians are male but only 40% of lawyers are male, and I did not take this into account. If the word “she” had been in the problem formulation, that would have made a difference.

The same mistake, forgetting the context (more lawyers than mathematicians), can be made in the case of Lucia de Berk. The chance that you are dealing with a murderous nurse is a priori (before you know what is going on) very much smaller than that you are dealing with an innocent nurse. You have to weigh that against the fact that the chance of 11 deaths is many times greater in the case of “murderous” than in the case of “innocent”.

The Bayesian way of performing the calculations in such cases also appears to be intuitively not easy to understand. But if we look back on the example of the party, maybe it is not so difficult at all.

The Bayesian calculation is best not done in terms of chances, but in terms of “odds”, an untranslatable word that does not exist in the Netherlands. Odds of 3 to 7 mean a chance of 3/10 that it is true and 7/10 that it is not. Englishmen understand what this means perfectly well, thanks to horse racing: odds of 3 to 7 means you win 7 if you are right and lose 3 if you are wrong. Chances and odds are two ways to describe the same thing. Another example: odds of 2 to 10 correspond to probabilities of 2/12 and 10/12.

You need two elements for a simple Bayesian calculation. The prior odds and the likelihood ratio. In the example, the prior odds are mathematician vs. lawyer 1 to 100. The likelihood ratio is the probability that a mathematician does calculations (100%) divided by the probability that a lawyer does (10%). So 10 to 1. Bayes’ theorem now says that you must multiply the prior odds (1 : 100) and the likelihood ratio (10 : 1) to get the posterior odds, so they are (1 x 10 : 100 x 1) = (10 : 100) = (1 : 10), corresponding to a probability of 1 / 11 that it is a mathematician and 10/11 that it is a lawyer. Precisely what we found before. The posterior odds are what you can say after the data are known, the prior odds are what you could say before. And the likelihood ratio is the way you learn from data.

Back to the Lucy Clog case. If the chance of 11 deaths is 1 in 100 million when Lucy Clog is innocent, and 1/2 when she is guilty – more about that “1/2” much later – then the likelihood ratio for innocent against guilty is 1 : 50 million. But to calculate the posterior probability of being guilty, you need the prior odds. They follow from the chance that a random nurse will commit murders. I estimate that at 1 to 400,000. There are forty thousand nurses in hospitals in the Netherlands, so that would mean nursing killings once every 10 years. I hope that is an overestimate.

Bayes’ theorem now says that the posterior odds of “innocent” in the event of 11 deaths would be 400,000 to 50 million. That’s 8 : 1000, so a small chance of 8/1008, maybe enough to convict someone. Yet large enough to want to know more. And there is much more worth knowing.

For instance, it is remarkable that nobody saw Lucy doing anything wrong. It is even stranger when further investigation yields no evidence of murder. If you think that there would still be an 80% chance of finding clues in the event of many murders, against of course 0% if it is a coincidence, then the likelihood ratio of the fact “no evidence was found” is 100 : 20 in favour of innocence. Application of the rule shows that we now have odds of 40 : 1000, so a small 4% chance of innocence. Conviction now becomes really questionable. And if the suspect continues to deny, which is more plausible when she is innocent than when she is guilty, say twice as plausible, the odds turn into 80 : 1000, almost 8% chance of innocence.

As an explanation, a way of looking at this that requires less calculation work (but says exactly the same thing) is as follows: It follows from the assumptions that in 20,000 years it occurs 1008 times that 11 deaths occur in a nurse’s shifts: 1,000 of the nurses are guilty and 8 are innocent. Evidence for murder is found for 800 of the guilty nurses, moreover, 100 of the remaining 200 confess. That leaves 100 guilty and 8 innocent among the nurses who did not confess and for whom no evidence for murder was found.

So Lucy Clog must be acquitted. And all the while, I haven’t even talked about doubts about the exact probability of 1 in 100 million that “by chance” 11 people die in so many nurses’ shifts, when on average it would only be 1. This probability would be many times higher in every Bayesian analysis. I estimate, based on experience, that 1 in 2 million would come out. A Bayesian analysis can include uncertainties. Uncertainties about the similarity of circumstances and qualities of nurses, for example. And uncertainties increase the chance of extreme events enormously, the literature contains many interesting examples. As I said, I think that if I had access to the data that Elffers uses, I would not get a chance of 1 in 100 million, but a chance of 1 in 2 million. At least I assume that for the time being; it would not surprise me if it were much higher still!

Preliminary calculations show that it might even be as high as 1 in 100,000. But 1 in 2 million already saves a factor of 50 compared to 1 in 100 million, and my odds would not be 80 to 1000 but 4000 to 1000, so 4 to 1. A chance of 80% to wrongly convict. This is the 80% chance of innocence that I mentioned in the beginning. Unfortunately, it is not possible to explain the factor 50 (or a factor 1000 if the 1 in 100,000 turns out to be correct) from the last step within the framework of this article without resorting to mathematics. [Aart de Vos is probably thinking of Poisson distributions, but adding a hyperprior over the Poisson mean of 1, in order to take account of uncertainty in the true rate of deaths, as well as heterogeneity between nurses, causing some to have shifts with higher death rates than others – RDG]

What I hope has become clear is that you can always add information. “Not being able to find concrete evidence of murder” and “has not confessed” are new pieces of evidence that change the odds. And perhaps there are countless facts to add. In the case of Lucia de Berk, those kinds of facts are there. In the hypothetical case of Lucy Clog, not.

The fact that you can always add information in a Bayesian analysis is the most beautiful aspect of it. From prior odds, you come through data (11 deaths) to posterior odds, and these are again prior odds for the next steps: no concrete evidence for murder, and no confession by our suspect. Virtually all further facts that emerge in a court case can be dealt with in this way in the analysis. Any fact that has a different probability under the hypothesis of guilt than under the hypothesis of innocence contributes. Perhaps the reader has noticed that we only talked about the chances of what actually happened under various hypotheses, never about what could have happened but didn’t. A classic statistical test always talks about the probability of 11 or more deaths. That “or more” is irrelevant and misleading according to Bayesians. Incidentally, it is not necessarily easier to just talk about what happened. What is the probability of exactly 11 deaths if Lucy de Clog is guilty? The number of murders, something with a lot of uncertainty about it, determines how many deaths there are, but even though you are fired after 11 deaths, the classical statistician talks about the chance of you committing even more if you are kept on. And that last fact matters for the odds. That’s why I put in a probability of 50%, not 100%, for a murderous nurse killing exactly 11 patients. But that only makes a factor 2 difference.

It should be clear that it is not easy to come to firm statements if there is no convincing evidence. The most famous example, for which many Bayesians have performed calculations, is a murder in California in 1956, committed by a black man with a white woman in a yellow Cadillac. A couple who met this description was taken to court, and many statistical analyses followed. I have done a lot of calculations on this example myself, and have experienced how difficult, but also surprising and satisfying, it is to constantly add new elements.

A whole book is devoted to a similar famous case: “a Probabilistic Analysis of the Sacco and Vanzetti Evidence,” published in 1996 by Jay Kadane, professor of Carnegie Mellon and one of the most prominent Bayesians. If you want to know more, just consult his c.v. on his website In the “Statistics and the Law” field alone, he has more than thirty publications to his name, along with hundreds of other articles. This is now a well-developed field in America.


I have thought for a long time about what the conclusion of this story is, and I have had to revise my opinion several times. And the perhaps surprising conclusion is: the actions of all parties are not that bad, only their rationalization is, to put it mildly, a bit strange. Elffers makes strange calculations but formulates the conclusions in court in such a way that it becomes intuitively clear that he is not giving the answer that the court is looking for. The judge makes judgments that sound as though they are in terms of probabilities but I cannot figure out what the judge’s probabilities are. But when I see what is going on I do get the feeling that it is much more like what is optimal than I would have thought possible, given the absurd rationalisations. The explanation is simple: judges’ actions are based on a process learnt by evolution, judges’ justifications are stuck on afterwards, and learnt through training. In my opinion, the Bayesian method is the only way to balance decisions under uncertainty about actions and rationalization. And that can be very fruitful. But the profit is initially much smaller than people think. What the court does in the case of Lucia de B is surprisingly rational. The 11 deaths are not convincing in themselves, but enough to change the prior odds from 1 in 40,000 to odds from 16 to 5, in short, an order of magnitude in which it is necessary to gather additional information before judging. Exactly what the court does. [de Vos has an optimistic view. He does not realise that the court is being fed false facts by the hospital managers – they tell the truth but not the whole truth; he does not realise that Elffers’ calculation was wrong because de Vos, as a Bayesian, doesn’t know what good classical statisticians do; neither he nor Elffers checks the data and finds out how exactly it was collected; he does not know that the medical experts’ diagnoses are influenced by Elffers’ statistics. Unfortunately, the defence hired a pure probabilist, and a kind of philosopher of probability, neither of whom knew anything about any kind of statistics, whether classical or Bayesian – RDG]

When I made my calculations, I thought at times: I have to go to court. I finally sent the article but I heard nothing more about it. It turned out that the defence had called for a witness who seriously criticized Elffers’ calculations. However, without presenting the solution. [The judge found the defence witness’s criticism incomprehensible, and useless to boot. It contained no constructive elements. But without doing statistics, anybody could see that the coincidence couldn’t be pure chance. It wasn’t: one could say that the data was faked. On the other hand, the judge did understand Elffers perfectly well – RDG].

Maybe I will once again have the opportunity to fully calculate probabilities in the Lucia de Berk case. That could provide new insights. But it is quite a job. In this case, there is much more information than is used here, such as poisonous traces in patients. Here too, it is likely that a Bayesian analysis that takes into account all the uncertainties shows that statements by experts who say something like “it is impossible that there is another explanation than the administration of poison by Lucia de Berk” should be taken with a grain of salt. Experts are usually people who overestimate their certainty. On the other hand, incriminating information can also build up. Ten independent facts that are twice as likely under the hypothesis of guilt change the odds by a factor of 1000. And if it turns out that the toxic traces found in the bodies of five deceased patients are each nine times more likely if Lucia is a murderer than if she isn’t, it saves a factor of nine to the fifth, a small 60,000. Etc, etc

But I think the court is more or less like that. It uses an incomprehensible language, that is, incomprehensible to probabilists, but a language sanctioned by evolution. We have few cases of convictions that were found to be wrong in the Netherlands. [Well! That was a Dutch layperson, writing in 2004. According to Ton Derksen, in the Netherlands about 10% of very long term prisoners (very serious cases) are innocent. It is probably something similar in other jurisdictions – RDG].

If you did the entire process in terms of probability calculations, the resulting debates between prosecutors and lawyers would become endless. And given their poor knowledge of probability, it is also undesirable for the time being. They have their secret language that usually led to reasonable conclusions. Even the chance that Lucia de Berk is guilty cannot be expressed in their language. There is also no law in the Netherlands that defines “legal and convincing evidence” in terms of the chance that a decision is correct. Is that 95%? Or 99%? Judges will maintain that it is 99.99%. But judges are experts.

So I don’t think it’s wise to try to cast the process in terms of probability right now. But perhaps this discussion will produce something in the longer term. Judges who are well informed about the statistical significance of the starting situation and then write down a number for each piece of evidence of prosecutor and defender. The likelihood ratio of each fact must be motivated. At the end, multiply all these numbers together, and have the calculations checked again by a Bayesian statistician. However, I consider this a long-term perspective. I fear (I am not really young anymore) it won’t come in my lifetime.

The magic of the d’Alembert

Simulations of the d’Alembert on a faIr roulette wheel with 36 paying outcomes and one “0”. Even odds bets (e.g., red versus black). Each line is one game. Each picture is 200 games. Parameters: initial capital of 25 units, maximum number of rounds is 21, emergency stop if capital falls below 15.


Harry Crane and Glenn Shafer (2020), Risk is random: The magic of the d’Alembert.

Stewart N. Ethier (2010), The Doctrine of Chances Probabilistic Aspects of Gambling. Springer-Verlag: Berlin, Heidelberg.

startKapitaal <- 25
eersteInzet <- 1
noodstopKapitaal <- 15
aantalBeurten <- 21
K <- 100
J <- 200
winsten <- rep(0, K)

for (k in (1:K)){

	plot(x = -2, y = -1, ylim = c(-5, 45), xlim = c(0, 22), xlab = "Beurt", ylab = "Kapitaal")
	abline(h = 25)
	abline(h = 0, col = "red")

	aantalKeerWinst <- 0
	totaleWinst <- 0

	for (j in (1:J)) {

		huidigeKapitaal <- startKapitaal
		huidigeInzet <- eersteInzet
		resultaten <- sample(x = c(-1, +1), prob = c(19, 18), size = aantalBeurten, replace = TRUE)
		verloop <- rep(0, aantalBeurten)
		stappen <- rep(0, aantalBeurten)
		for(i in 1:aantalBeurten) {
			 huidigeResultaat <- resultaten[i]
			 if(huidigeInzet > 0){
				  stap <- huidigeResultaat * huidigeInzet
				  stappen [i] <- stap
				  huidigeKapitaal <- huidigeKapitaal + stap
				  huidigeInzet <- max(1, huidigeInzet - stap)
				  if(huidigeKapitaal < noodstopKapitaal) {huidigeInzet <- 0}
				  verloop[i] <- huidigeKapitaal
			 } else {
				  stappen[i] <- 0
				  verloop[i] <- huidigeKapitaal
	aantalKeerWinst <- aantalKeerWinst + (verloop[aantalBeurten] > startKapitaal)
	totaleWinst <- totaleWinst + (huidigeKapitaal - startKapitaal)
	lines(0:aantalBeurten, c(startKapitaal, verloop) + runif(1, -0.15, +0.15 ), add = TRUE)
print(c(k, aantalKeerWinst, totaleWinst))
winsten[k] <- totaleWinst

The program repeatedly runs and plots 200 games of each maximally 21 rounds. Below are the total number of times that the player made a profit, and the final net gain, for 100 sets of 200 games. The sets are numbered 1 to 100.

[1]    1  100 -483
[1]    2  108 -336
[1]    3  103 -517
[1]    4  110 -275
[1]   5 123 -40
[1]   6 125 148
[1]    7  115 -209
[1]    8  104 -427
[1]    9  108 -356
[1]   10  110 -225
[1]   11  101 -440
[1]  12 120  80
[1]   13  108 -334
[1]   14  110 -279
[1]   15   99 -538
[1]   16  114 -101
[1]  17 113 -92
[1]  18 117 -87
[1]   19  104 -363
[1]   20  103 -320
[1]  21 114 -52
[1]   22  107 -422
[1]   23  108 -226
[1]   24  115 -173
[1]   25  110 -209
[1]   26  109 -261
[1]   27  114 -186
[1]  28 120 -62
[1]  29 123  35
[1]   30  101 -442
[1]   31  111 -215
[1]   32  104 -378
[1]  33 120  49
[1]  34 117 -49
[1]   35  119 -102
[1]   36  104 -488
[1]   37  107 -402
[1]  38 122  38
[1]   39  100 -549
[1]  40 116 -31
[1]  41 127 220
[1]   42  105 -427
[1]   43  114 -153
[1]   44  109 -256
[1]   45  119 -166
[1]  46 121  47
[1]   47  105 -417
[1]   48  113 -134
[1]  49 121 111
[1]   50  112 -307
[1]  51 114 -92
[1]  52 123 123
[1]  53 118  24
[1]   54  113 -188
[1]  55 124 127
[1]   56  110 -229
[1]   57  113 -255
[1]   58  101 -554
[1]   59  114 -345
[1]  60 124 236
[1]   61   97 -599
[1]   62  115 -220
[1]  63 120  55
[1]   64  102 -512
[1]  65 121 109
[1]   66  112 -219
[1]   67  112 -181
[1]  68 115 -45
[1]   69  107 -474
[1]   70  109 -272
[1]   71  116 -134
[1]   72  107 -440
[1]   73  108 -470
[1]  74 119 -85
[1]  75 115   1
[1]  76 115 -88
[1]   77  113 -219
[1]  78 118 -55
[1]   79  115 -150
[1]  80 124  70
[1]   81  115 -203
[1]   82  115 -153
[1]   83  109 -219
[1]   84   97 -675
[1]   85  108 -396
[1]   86  112 -220
[1]   87  115 -187
[1]   88  108 -290
[1]   89  114 -182
[1]   90  105 -439
[1]   91  113 -183
[1]   92  115 -216
[1]  93 124 110
[1]   94  115 -173
[1]  95 125 177
[1]   96  110 -203
[1]  97 128 160
[1]  98 114 -83
[1]  99 118 -90
[1] 100 123 106

Steve Gull’s challenge: An impossible Monte Carlo simulation project in distributed computing

At the 8th MaxEnt conference in 1998, held in Cambridge UK, Ed Jaynes was the star of the show. His opening lecture has the following abstract: “We show how the character of a scientific theory depends on one’s attitude toward probability. Many circumstances seem mysterious or paradoxical to one who thinks that probabilities are real physical properties existing in Nature. But when we adopt the “Bayesian Inference” viewpoint of Harold Jeffreys, paradoxes often become simple platitudes and we have a more powerful tool for useful calculations. This is illustrated by three examples from widely different fields: diffusion in kinetic theory, the Einstein–Podolsky–Rosen (EPR) paradox in quantum theory [he refers here to Bell’s theorem and Bell’s inequalities], and the second law of thermodynamics in biology.”

Unfortunately Jaynes was completely wrong in believing that John Bell had merely muddled up his conditional probabilities in proving the famous Bell inequalities and deriving the famous Bell theorem. At the conference, astrophysicist Steve Gull presented a three line proof of Bell’s theorem using some well known facts from Fourier analysis. The proof sketch can be found in a scan of four smudged overhead sheets on Gull’s personal webpages at Cambridge University.

Together with Dilara Karakozak I believe I have managed to decode Gull’s proof,, though this did require quite some inventiveness. I have given a talk presenting our solution and point out further open problems. I have the feeling progress could be made on interesting generalisations using newer probability inequalities for functions of Rademacher variables.

Here are slides of the talk:

Not being satisfied, I wrote a new version of the talk, using different tools. Notes written with Apple pencil on the iPad, then I discuss them while recording my voice and the screen (so: either composing the notes live, or editing them live)

A fungal year

I want to document the more than 20 species of wild mushrooms which I’ve collected and enjoyed eating this year. I will go through my collection of photographs in reverse chronological order. But above, the featured image, taken back in September: Neoboletus luridiformis, the scarletina bolete; in Dutch, heksenboleet (witch’s bolete. Don’t worry. The guy to avoid is the devil’s bolete).

I get my mushroom knowledge from quite a few books and from many websites. In this blog I will just give the English and Dutch wikipedia pages for each species. I highly recommend Google searching the Latin name (though notice – scientific names do change, as science gives us new knowledge) and if your French, German or other favourite language also has a wikipedia page, nature lover’s web pages, forager’s webpages, or whatever, check them out, because ideas of edibility and of how to cook mushrooms which are considered edible varies all over the world. If at some time there was a famine, and the only country people who could survive were those who went out in the forest and found something they could eat, then their fellows who had allergic reactions to those same mushrooms did not survive, and in this way different human populations are adapted to different fungi populations. It’s also very important to consult local knowledge (in the form of local handbooks, local websites) since the dangerous poisonous look-alikes which you must avoid vary in different parts of the world.

Do not eat wild mushrooms raw. You don’t know what is still crawling about in it, and you don’t know what has pooped or pissed on it or munched at it recently. Twenty minutes gentle cooking should destroy anything nasty, and moreover, it breaks down substances which are hard for humans to digest. The rigid structure of mushrooms is made of chitin (which insects use for their external body) and we cannot digest it raw. Some people have allergic reactions to raw chitin.


Paralepista flaccida

Russula cyanoxantha

Armillaria mellea

Coprinus comatus

Suillus luteus

Amanita muscaria

Sparassis crispa

[To be continued]

Appendix: some mushrooms and fungi to be wondered at, but not eaten

1. Paralepista flaccida

Tawny funnel, Roodbruine schijnridderzwam. Grows in my back garden in an unobtrusive spot, fruiting every year in December to January. Yellow-pinkish spore print, lovely smell, nice taste. Also after frying! The combination of aroma/taste/spore-print just does not fit any of the descriptions of this mushroom or those easy to confuse with it which I can find. There is a poisonous lookalike which however is not supposed to taste good, so that’s why I dared to eat this one. It grows close to a Lawson cyprus but there may be other old wood remains underground in the same spot.

English wikipedia:

Netherlands wikipedia:

2. Russula cyanoxantha

Charcoal burner, Regenboogrussula (rainbow russula). Very common in the forests behind “Palace het Loo”. A really delicious russula species, easy to identify.

English wikipedia:

Netherlands wikipedia:

3. Armillaria mellea

Honey fungus, echte honingzwam. These fellows are growing out of the base of majestic beech trees at Palace het Loo. The trees are all being cut down now; excuse: “they’re sick”; true reason: high quality beech wood is very valuable. The trees are hosts to numerous fungi, animals, birds. The managers of the park have been doing their best to kill them off for several decades by blowing their fallen leaves away and driving heavy machinery around. Looks like their evil designs are bearing fruit now.

English wikipedia:

Netherlands wikipedia:

4. Coprinus comatus

Shaggy ink cap, Geschubde inktzwam. One of the last ones of the season, very fresh, from a field at the entrance to the Palace park. These guys are so delicious, fried in butter with perhaps lemon juice, and a little salt and pepper, they have a gentle mushroom flavour, they somehow remind me of oysters. And of Autumns in Aarhus, picking them often from the lawns of the university campus.

English wikipedia:

Dutch wikipedia:

5. Suillus luteus

Suillus luteus

Slippery jack, bruine ringboleet. This one looks rather slimy and it is said that it needs to be cooked well, it disagrees with some people. It didn’t disagree with me at all, but I must say it did not have much flavour, and does feel a bit slippery in your mouth.

English wikipedia:

Dutch wikipedia:

6. Amanita muscaria

Fly agaric, vliegenzwam. This mushroom contains both poisons and psychoactive substances. However, both are water soluble. One therefore boils these mushrooms lightly for 20 minutes in plenty of lightly salted water with a dash of vinegar, then drain and discard the fluid; then they can be fried in butter and brought up to taste with salt and pepper. They are then actually very tasty, in my opinion.

Another use for them is to soak them in a bowl of water and leave in your kitchen. Flies will come and investigate it, taste some get high (literally and figuratively) and drop dead. The smell is pretty disgusting at this stage.

I understand you can dry them, grind to powder, and make tea. This allegedly destroys the poisons but leaves enough of the psychoactive substances to have interesting effects. I haven’t tried it, since one of the effects is to set your heart racing, and since I have a dangerously irregular hearth rhythm already, I should not experiment with this.

Some people munch a small piece raw, from time to time, while walking in the forests. I have tried that – teaspoon size, desertspoon size even, without noticing anything except that perhaps for a moment everything sparkled more beautifully than usual. Probably that was the placebo effect.

Amanita muscaria is not terribly poisonous. If you cook and eat three or four you will probably throw up after an hour or two and also experience rather unpleasant hallucinations. To be rounded off with diarrhea and generally feeling unwell. You might find yourself getting very large or very small, it depends of course whether you nibble from the right-hand edge of the mushroom or the left-hand edge. You might believe you can fly so it can be dangerous to be in high places on your own. The poisons may damage your liver but being water soluble they are quite efficiently and rapidly excreted from the body, which is a good thing, so eating them just once probably won’t kill you and probably won’t give you permanent damage. Several other Amanita species are deadly poisonous. With poisons which do not dissolve in water and do not leave your body after you’ve eaten them, but instead destroy your liver in a few days. One must learn to recognise those mushrooms very well. In my part of the world: Amanita phalloides – the death cap (groene knolamaniet); Amanita pantherina – the panther cap (panteramaniet). I have seen these two even in the parks and roadside verges in my town, as well as in the forests outside. More rare is Amanita virosa – the destroying angel (kleverige knolamaniet). But I believe I have seen it close to home, too. It is a white mushroom with white gills and consequently many people believe you must never touch a white mushroom with white gills. Consequently, writers of mushroom books themselves generally have the idea that edible white mushroom with white gills, which do exist, do not taste particularly good, either, and so one should not bother with them. Hence they do not explain well how you can tell the difference. We will later (i.e., earlier this year) meet the counterexample to that myth.

Because of the psychoactive effects of Amanita muscaria it is actually presently illegal, in the Netherlands, to be found in possession of more than a very small amount.

7. Sparassis crispa

The cauliflower mushroom, grote sponszwam. One of my favourites. It does have the tendency to envelope leaves and insects in its folds. Before cooking it has a wonderful aroma, almost aromatic, but on frying it seems to lose a lot of flavour.

English wikipedia:

Dutch wikipedia:

Time, Reality and Bell’s Theorem

Featured image: John Bell with a Schneekugel (snowing ball) made by Renate Bertlmann; in the Bells’ flat in Geneva, 1989. © Renate Bertlmann.

Lorentz Center workshop proposal, Leiden, 6–10 September 2021

As quantum computing and quantum information technology moves from a wild dream into engineering and possibly even mass production and consumer products, the foundational aspects of quantum mechanics are more and more hotly discussed. Whether or not various quantum technologies can fulfil their theoretical promise depends on the fact that quantum mechanical phenomena cannot be merely emergent phenomena, emerging from a more fundamental physical framework of a more classical nature. At least, that is what Bell’s theorem is usually understood to say: any underlying mathematical physical framework which is able, to a reasonable approximation, to reproduce the statistical predictions made by quantum mechanics, cannot be local and realist. These words have nowadays precise mathematical meanings, but they stand for the general world view of physicists like Einstein, and in fact they stand for the general world view of the educated public. Quantum physics is understood to be weird, and perhaps even beyond understanding. “Shut up and calculate”, say many physicists.

Since the 2015 “loophole-free” Bell experiments of Delft, Munich, Vienna and at NIST, one can say even more: laboratory reality cannot be explained by a classical-like underlying theory. Those experiments were essentially watertight, at least as far as experimentally enforceable conditions are concerned. (Of course, here is heated discussion and criticism, too).

Since then however it seems that even more energy than ever before is being put into serious mathematical physics which somehow gets around Bell’s theorem. A more careful formulation of the theorem is that the statistical predictions of quantum mechanics cannot be reproduced by a theory having three key properties: locality, realism, and no-conspiracy. What is meant by no-conspiracy? It means that experimenters are free to choose settings of their experimental devices, independently of the underlying properties of the physical systems which they are investigating. In the case of a Bell-type experiment, a laser aimed at a crystal which emanates a pair of photons which arrive at two distant polarising photodectors, ie detectors which can measure the polarisation of a photon in directions chosen freely by the experimenter. If the universe actually evolves in a completely deterministic manner, then everything that goes on in those labs (housing the source and the detectors and all the cables or whatever in between) was determined already at the time of the big bang, the photons can in principle “know in advance” how they are going to be measured.

At the present time, highly respectable physicists are working on building a classical-like model for these experiments using superdeterminism. Gerard ’t Hooft used to be a lonely voice arguing for such models but he is no longer quite so alone (cf. Tim Palmer, Oxford, UK). Other physicists are using a concept called retro-causality: the future influences the past. This leads to “interpretations of quantum mechanics” in which the probabilistic predictions of quantum mechanics, which seem to have a built in arrow of time, do follow from a time symmetric physics (cf. Jaroslav Duda, Krakow, Poland).

Yet other physicists dismiss “realism” altogether. The wave function is the reality, the branching of many possible outcomes when quantum systems interact with macroscopic systems is an illusion. The Many Worlds Interpretation is still very alive. Then there is QBism, where the “B” probably was meant to stand for Bayesian (subjectivist) probability, in which one goes to an almost solipsistic view of physics; the only task of physics is to tell an agent what are the probabilities of what the agent is going to experience in the future; the agent is rational and uses the laws of quantum mechanics and standard Bayesian probability (the only rational way to express uncertainty or degrees of belief, according to this school) to update probabilities as new information is obtained. So there only is information. Information about what? This never needs to be decided.

On the right, interference patterns of waves of future quantum possibilities. On the left, the frozen actually materialised past. At the boundary, the waves break, and briefly shining fluorescent dots of light on the beach represent the consciousness of sentient beings. Take your seat and enjoy. Artist: A.C. Gill

Yet another serious escape route from Bell is to suppose that mathematics is wrong. This route is not taken seriously by many, though at the moment, Nicolas Gisin (Geneva), an outstanding experimentalist and theoretician, is exploring the possibility that an intuitionistic approach to the real numbers could actually be the right way to set up the physics of time. Klaas Landsman (Nijmegen) seems to be following a similar hunch.

Finally, many physicists do take “non-locality” as the serious way to go; and explore, with fascinating new experiments (a few years ago in China, Anton Zeilinger and Jian-Wei Pan; this year Donadi e al.), hypotheses concerning the idea that gravity itself leads to non-linearity in the basic equations of quantum mechanics, leading to the “collapse of the wave function”, by a definitely non-local process.

At the same time, public interest in quantum mechanics is bigger than ever, and non-academic physicists are doing original and interesting work, “outside of the mainstream”. Independent researchers can and do challenge orthodoxy, and it is good that someone is doing that. There is a feeling that the mainstream has reached an impasse. In our opinion, the outreach from academia to the public has also to some extent failed. Again and again, science supplements publish articles about amazing new experiments, showing ever more weird aspects of quantum mechanics, but it is often clear that the university publicity department and the science journalists involved did not understand a thing, and the newspaper articles are extraordinarily misleading if not palpably nonsense.

In the Netherlands there has long been a powerful interest in foundational aspects of quantum mechanics and also, of course, in the most daring experimental aspects. The Delft experiment of 2015 was already mentioned. At CWI, Amsterdam, there is an outstanding group led by Harry Buhrman in quantum computation; Delft has a large group of outstanding experimentalists and theoreticians, in many other universities there are small groups and also outstanding individuals. In particular one must mention Klaas Landsman and Hans Maassen in Nijmegen; and one must mention the groups working in the foundations of physics in Utrecht and in Rotterdam (Fred Muller). Earlier we had of course Gerard ’t Hooft, Dennis Dieks and Jos Uffinck in Utrecht; some of them retired but still active, others moved abroad. A new generation is picking up the baton.

The workshop will therefore bring a heterogeneous group of scientists together, many of whom disagree fundamentally on basic issues in physics. Is it an illusion to say that we can ever understand physical reality? All we can do is come up with sophisticated mathematics which amazingly gives the right answer. Yet there are conferences and Internet seminars where these disagreements are fought out, amicably, again and again. It seems that perhaps some of the disagreements are disagreements coming from different subcultures in physics, very different uses of the same words. It is certainly clear that many of those working on how to get around Bell’s theorem, actually have a picture of that theorem belonging to its early days. Our understanding has enormously developed over the decennia, and the latest experimentalists have perhaps a different theorem in mind, to the general picture held by theoretical physicists who come from relativity theory. Indubitably, the reverse is also true. We are certain that the meeting we want to organise will enable people from diverse backgrounds to understand one another more deeply and possibly “agree to differ” if the difference is a matter of taste; if however the difference has observable physical consequences then we must be able to figure out how to observe them.

The other aim of the workshop is to find better ways to communicate quantum mysteries to the public. A physical theory which basically overthrows our prior conceptions of time, space and reality, must impact culture, art, literature; it must become part of present day life; just as earlier scientific revolutions did. Copernicus, Galileo, Descartes, Newton taught us that the universe evolves in a deterministic (even if chaotic) way. Schrödinger, Bohr and all the rest told us this was not the case. The quantum nature of the universe certainly did impact popular culture but somehow it did not really impact the way that most physicists and engineers think about the world.

Illustration from Wikipedia, article on Bell’s Theorem. The best possible local realist imitation (red) for the quantum correlation of two spins in the singlet state (blue), insisting on perfect anti-correlation at 0°, perfect correlation at 180°. Many other possibilities exist for the classical correlation subject to these side conditions, but all are characterized by sharp peaks (and valleys) at 0°, 180°, and 360°, and none has more extreme values (±0.5) at 45°, 135°, 225°, and 315°. These values are marked by stars in the graph, and are the values measured in a standard Bell-CHSH type experiment: QM allows ±1/√2 = ±0.7071…, local realism predicts ±0.5 or less.

BOLC (Bureau Verloren Zaken) “reloaded”

Het BOLC is weer terug.

10 jaar geleden (in 2010) werd de Nederlandse verpleegster Lucia de Berk bij een nieuw proces vrijgesproken van een aanklacht van 7 moorden en 3 pogingen tot moord in ziekenhuizen in Den Haag in een aantal jaren in de aanloop naar slechts een paar dagen voor de gedenkwaardige datum van “9-11”. De laatste moord zou in de nacht van 4 september 2001 zijn gepleegd. De volgende middag meldden de ziekenhuisautoriteiten een reeks onverklaarbare sterfgevallen aan de gezondheidsinspectie en de politie. Ook plaatsten ze Lucia de B., zoals ze bekend werd in de Nederlandse media, op ‘non-active’. De media meldden dat er ongeveer 30 verdachte sterfgevallen en reanimaties werden onderzocht. De ziekenhuisautoriteiten meldden niet alleen wat volgens hen vreselijke misdaden waren, ze geloofden ook dat ze wisten wie de dader was.

De wielen van gerechtigheid draaien langzaam, dus er was een proces en een veroordeling; een beroep en een nieuw proces en een veroordeling; eindelijk een beroep op het hooggerechtshof. Het duurde tot 2006 voordat de veroordeling (levenslange gevangenisstraf, die in Nederland pas wordt beëindigd als de veroordeelde de gevangenis verlaat in een kist) onherroepelijk wordt. Alleen nieuw bewijs kan het omverwerpen. Nieuwe wetenschappelijke interpretaties van oud bewijs worden niet als nieuw bewijs beschouwd. Er was geen nieuw bewijs.

Maar al, in 2003-2004, maakten sommige mensen met een interne band met het Juliana Kinderziekenhuis zich al zorgen over de zaak. Nadat ze in vertrouwen met de hoogste autoriteiten over hun zorgen hadden gesproken, maar toen ze te horen kregen dat er niets aan te doen was, begonnen ze journalisten te benaderen. Langzaam maar zeker raakten de media weer geïnteresseerd in de zaak – het verhaal was niet meer het verhaal van de vreselijke heks die baby’s en oude mensen zonder duidelijke reden had vermoord, behalve voor het plezier in het doden, maar van een onschuldige persoon die was verminkt door pech, incompetente statistieken en een monsterlijk bureaucratisch systeem dat eens in beweging, niet kon worden gestopt.

Onder de supporters van Metta de Noo en Ton Derksen waren enkele professionele statistici, omdat Lucia’s aanvankelijke veroordeling was gebaseerd op een foutieve statistische analyse van door het ziekenhuis verstrekte onjuiste gegevens en geanalyseerd door amateurs en verkeerd begrepen door advocaten. Anderen waren informatici, sommigen waren ambtenaren op hoog niveau van verschillende overheidsorganen die ontsteld waren over wat ze zagen gebeuren; er waren onafhankelijke wetenschappers, een paar medisch specialisten, een paar mensen met een persoonlijke band met Lucia (maar geen directe familie); en vrienden van zulke mensen. Sommigen van ons werkten vrij intensief samen en werkten met name aan de internetsite voor Lucia, bouwden er een Engelstalige versie van en brachten deze onder de aandacht van wetenschappers over de hele wereld. Toen kranten als de New York Times en The Guardian begonnen te schrijven over een vermeende gerechtelijke dwaling met verkeerd geïnterpreteerde statistieken, ondersteund door opmerkingen van Britse topstatistici, hadden de Nederlandse journalisten nieuws voor de Nederlandse kranten, en dat soort nieuws werd zeker opgemerkt in de gangen van de macht in Den Haag.

Snel vooruit naar 2010, toen rechters niet alleen Lucia onschuldig verklaarden, maar voor de rechtszaal hard-op verklaarden dat Lucia samen met haar collega-verpleegkundigen uiterst professioneel had gevochten om het leven van baby’s te redden die onnodig in gevaar werden gebracht door medische fouten van de medisch specialisten die waren belast met hun zorg. Ze vermeldden ook dat alleen omdat het tijdstip van overlijden van een terminaal zieke persoon niet van tevoren kon worden voorspeld, dit niet betekende dat het noodzakelijkerwijs onverklaarbaar en dus verdacht was.

Enkelen van ons, opgetogen door onze overwinning, besloten om samen te werken en een soort collectief te vormen dat zou kijken naar andere ‘verloren zaken’ met mogelijke justitiele dwalingen waar de wetenschap misbruikt was. Ik had al had mijn eigen onderzoeksactiviteiten omgebogen en gericht op het snelgroeiende veld van forensische statistiek, en ik was al diep betrokken bij de zaak Kevin Sweeney en de zaak van José Booij. Al snel hadden we een website en waren we hard aan het werk, maar kort daarna gebeurde er een opeenvolging van ongelukken. Ten eerste betaalde het ziekenhuis van Lucia een dure advocaat om me onder druk te zetten namens de hoofdkinderarts van het Juliana Children’s Hospital. Ik had namelijk wat persoonlijke informatie over deze persoon (die toevallig de schoonzus was van Metta de Noo en Ton Derksen) geschreven op mijn homepage aan de Universiteit van Leiden. Ik voelde dat het van cruciaal belang was om te begrijpen hoe de zaak tegen Lucia was begonnen en dit had zeker veel te maken met de persoonlijkheden van enkele sleutelfiguren in dat ziekenhuis. Ik schreef ook naar het ziekenhuis en vroeg om meer gegevens over de sterfgevallen en andere incidenten op de afdelingen waar Lucia had gewerkt, om het professionele onafhankelijke statistische onderzoek te voltooien dat had moeten plaatsvinden toen de zaak begon. Ik werd bedreigd en geïntimideerd. Ik vond enige bescherming van mijn eigen universiteit die namens mij dure advocatenkosten betaalde. Mijn advocaat adviseerde me echter al snel om toe te geven door aanstootgevend materiaal van internet te verwijderen, want als dit naar de rechtbank zou gaan, zou het ziekenhuis waarschijnlijk winnen. Ik zou de reputatie van rijke mensen en van een machtige organisatie schaden en ik zou moeten boeten voor de schade die ik had aangericht. Ik moest beloven om deze dingen nooit weer te zeggen en ik zou beboet worden als ze ooit herhaald zou worden door anderen. Ik heb nooit toegegeven aan deze eisen. Later heb ik wel wat gepubliceerd en naar het ziekenhuis opgestuurd. Ze bleven stil. Het was een interessante spel bluf poker.

Ten tweede schreef ik op gewone internetfora enkele zinnen waarin ik José Booij verdedigde, maar die de persoon die haar bij de kinderbescherming had aangegeven ook van schuld verweet. Dat was geen rijk persoon, maar zeker een slim persoon, en ze meldden mij bij de politie. Ik werd verdachte in een geval van vermeende laster. Geïnterviewd door een aardige lokale politieagent. En een paar maanden later kreeg ik een brief van de lokale strafrechter waarin stond dat als ik 200 euro administratiekosten zou betalen, de zaak administratief zou worden afgesloten. Ik hoefde geen schuld te bekennen maar kon ook niet aantekenen dat ik me onschuldig vond.

Dit leidde ertoe dat het Bureau Verloren Zaken zijn activiteiten een tijdje stopzette. Maar het is nu tijd voor een come-back, een “re-boot”. Ondertussen deed ik niet niets, maar raakte ik betrokken bij een half dozijn andere zaken, en leerde ik steeds meer over recht, over forensische statistiek, over wetenschappelijke integriteit, over organisaties, psychologie en sociale media. De BOLC is terug.


Het BOLC is al een paar jaar inactief, maar nu de oprichter de officiële pensioenleeftijd heeft bereikt, “herstart” hij de organisatie. Richard Gill richtte de BOLC op aan de vooravond van de vrijspraak van verpleegster Lucia de Berk in 2006. Een groep vrienden die nauw betrokken waren geweest bij de beweging om Lucia een eerlijk proces te bezorgen, besloten dat ze zo genoten van elkaars gezelschap en zoveel hadden geleerd van de ervaring van de afgelopen jaren, dat ze hun vaardigheden wilden uitproberen op enkele nieuwe cases. We kwamen snel een aantal ernstige problemen tegen en stopten onze website tijdelijk, hoewel de activiteiten in verschillende gevallen werden voortgezet, meer ervaring werd opgedaan, veel werd geleerd.

We vinden dat het tijd is om het opnieuw te proberen, nadat we enkele nuttige lessen hebben geleerd van onze mislukkingen van de afgelopen jaren. Hier is een globaal overzicht van onze plannen.

  1. Zet een robuuste formele structuur op met een bestuur (voorzitter, secretaris, penningmeester) en een adviesraad. In plaats van het de wetenschappelijke adviesraad te noemen, zoals gebruikelijk in academische organisaties, zou het een morele en / of wijsheidsadviesraad moeten zijn om op de hoogte te worden gehouden van onze activiteiten en ons te laten weten als ze denken dat we van de rails gaan.
  2. Eventueel een aanvraag indienen om een Stichting te worden. Dit betekent dat we ook zoiets zijn als een vereniging of een club, met een jaarlijkse algemene vergadering. We zouden leden hebben, die misschien ook donaties willen doen, aangezien het runnen van een website en het af en toe in de problemen komen geld kost.
  3. Schrijf over de zaken waar we de afgelopen jaren bij betrokken zijn geweest, met name: vermeende seriemoordenaars Ben Geen (VK), Daniela Poggiali (Italië); beschuldigingen van wetenschappelijk wangedrag in het geval van het proefschrift van een student van Peter Nijkamp; het geval van de AD Haring-test en de kwaliteit van Dutch New Herring; het geval van Kevin Sweeney.
Exit mobile version