Is Lucy’s post-it note a confession? Whether you will see it as a confession or a cry of innocent anguish depends on whether *you* have a heart and a brain. If you read it carefully, you will see that Lucy does not say that she killed those babies. She says that *they said* she killed those babies. Yes, she does say she is evil. She thinks she is clearly a bad nurse who apparently couldn’t save those babies, despite her (possibly too energetic, and certainly not well supervised) attempts. More seriously, she had had an affair with an older married man, a doctor, who later dumped her and betrayed her. She spoke out about doctors’ mistakes and about the catastrophic hygienic circumstances in which she and her colleagues had to work. For two years, doctors had tried to have her taken off that ward, because she pissed them off. Her colleague nurses loved her for her forthrightness and lovely character. She is so sorry for the suffering she caused her parents and step-brothers. She is considering suicide. She has PTSD.
“Contempt of court” means disrespect of a court. Now, it is certainly true that I am disrespectful of the court which convicted Lucy Letby. I think that the trial was unfair and that the judge did not understand what was going on. Nor did the jury. The jury was incomplete and the verdicts were not unanimous, yet the sentence was the heaviest possible. The defence made little attempt to defend their client and the UK tabloid newspapers had convicted Lucy long ago. On one of the days that she was arrested, the TV vans were in her street, before the police arrived to knock on her door and take her away. Six years of police investigation by a team of 60 to 70 police inspectors, including a large PR department (read: a little troll farm), did not find any conclusive proof of any wrongdoing by Lucy Letby at all. Yet already Cheshire Constabulary have signed a contract with Netflix and ITN for a documentary on their fantastic work nailing the UK’s most horrific female serial killer ever.
Now, “contempt of court” is also a very serious criminal offence in the UK, but as such, it has a very narrow definition. The definition involves the motive of the perpetrator. This is like killing someone. Killing a person might be murder. But it might be an accident. It might be caused by negligence. It is only premeditated murder if the person who killed the victim planned to do so in advance and deliberately and successfully carried out their plan. Lucy Letby is convicted of a large number of premeditated murders and murder attempts. The jury believed that she had motive and opportunity and deliberately tried in some cases numerous times to kill the same infant.
As the trial of Lucy Letby proceeded, various independent observers with a scientific background started studying the case and commenting on it on various internet sites. There was my own blog, gill1109.com. There was Peter Elston’s “Chimpinvestor” blog, chimpinvestor.com. There was Scott McLachlan’s Law, Health and Technology “Substack”. There was Sarrita Adams elaborate and dedicated website rexvlucyletby2023.com, later morphed into the even more elaborate ScienceOnTrial.com. Numerous individuals of course also tweeted on the case, several FaceBook groups started up, several SubReddits were founded. Cheshire Constabulary kept a close eye on social media and dedicated websites and became more and more active in trying to suppress any support of the defence of Lucy Letby, though all those Twitter users calling for the return of hanging and for Lucy to be assassinated as soon as possible in the most horrific way, were presumably encouraged by Cheshire Constabulary.
Around May, while the trial still had a few months to run, the police apparently started to become nervous. Threatening emails were sent to myself, Peter Elston, and to Sarrita Adams, telling us that our websites must be taken down and links to those sites on social media should be removed. We know that the police also attempted to find out who was behind the Law, Health and Technology substack, but did not succeed so easily.
Of course, they found me, easily. But how did they discover the identity of the anonymous owner of rexvlucyletby2023.com, Sarrita Adams, who tried very hard indeed, for very sound personal reasons, to remain anonymous? The answer is simple: at some point Sarrita and I emailed to the court trying to alert the judge that the trial was unfair, and that important scientific evidence was hidden from the jury and the public. We did this through emails to the clerks of the court, asking them to bring our messages to the attention of the judge. However, this is not what they did. They gave the messages to police inspectors from Cheshire Constabulary, who were in court every day, hobnobbing with both the barristers, the judge, and with top NHS lawyers.
They also divulged the identity of Sarrita Adams to their internet trolls who rapidly managed to dig up a lot of dirt about Sarrita and dox her on Twitter.
The email letters which Peter, Sarrita and I were sent, are very interesting. They say that our internet activities were discovered by the police and that the police had discussed with the defence team, the defendant, and the judge, and that the judge said that what we were doing appeared to be contempt of court. We should remove our websites and remove all links to them on social media. According to the police, Sarrita and I were “associates” though we were in no way associated at all except in our common belief that the trial was unfair and the scientific evidence incorrectly interpreted. Yes, we had communicated with one another. The judge did point out that this was just his initial reaction and he couldn’t state that it was contempt of court without hearing our motivation from us. This shows again that he never received our emails to the court. Our stated motivation was to prevent a possible miscarriage of justice, not to cause a miscarriage of justice by subversion of the jury. We were attempting to contact all relevant authorities, not the jury at all. Indeed, since later the jury found Lucy guilty of the most heinous crimes, it is clear that we did not influence the jury at all.
I replied to the police by email that I would do what they asked. I did not remove my blog posts on the case but I did diligently delete links to Sarrita’s site and all tweets by myself with links to my blog or Sarrita’s website. I did not get a reply, though I asked who was emailing me and said that I wanted to talk to them, by telephone or Zoom. The letters had no phone number and no first name of whoever wrote them. I called Cheshire Constabulary by phone but they couldn’t help me because I did not know the initials or first name of whoever had emailed me.
About three weeks later, the jury was now deliberating in private. One Friday evening very late I was shocked by a knock at the door. (Actually, I had already gone to bed, but my son was visiting and woke me up. Thankfully, my wife slept through the whole thing). Local Dutch police wanted to deliver two letters to me, on paper, in person. They had been instructed to verify my identity and naturally, I did show them my Dutch passport. The letters were almost identical to the email letters which I had received earlier, and had already and immediately replied to. They did not have wet signatures, they were clearly printooouts of pdfs. Similar, but not identical to what I had already received.
So now Cheshire Constabulary had legal proof, with the help of their Dutch colleagues. that I had indeed received their letters! The letters threatened arrest next time I tried to enter the UK, and noted that contempt of court carries a two year prison sentence and a huge fine – namely, the costs of rerunning the whole trial with a fresh jury. It was pointed out that as a UK citizen I was still subject to UK law even though I lived in another country. The same thing was said to Sarrita, who lives in California, but is also a British citizen.
This was clearly intended to intimidate, and indeed it was very intimidating. I will now reproduce the original email letters and the later, paper, version. The wording is fascinating, the intention was to intimidate, but UK police cannot charge me with contempt of court without an order from a magistrate, and as Judge Goss remarked, he would need to know my actual motives before he could say that I had indeed likely committed the crime of contempt of court.
This spreadsheet was shown on TV both yesterday (Friday August 18, the day of the verdicts) and at the start of the trial of Lucy Letby. Apparently, Cheshire Constabulary find this absolutely damning evidence against Lucy. And indeed, many journalists seem to agree.
The 25 events are almost all of the events at which LL was present during the periods investigated. They are suspicious because she was under suspicion when the police started their investigations. Not surprisingly, most nurses are not present at many of these events. And of course, many nurses probably work far fewer hours than LL. Many are often on administrative duties.
The doctors on the ward are of course missing. Doctors were never investigated as suspects but from the start of police investigations apparently always believed to speak gospel truth. During cross-examination, during the trial, some of them have changed various parts of their stories. Of course, unlike Lucy, they do not lie, since they could never (under oath in court, or earlier, when being interviewed as witnesses by police) be saying untruths in order to deceive.
Back to the spreadsheet. When drawing conclusions from any data it is important to know how it was gathered. It is important to know what data is missing, but would be needed draw even the most preliminary and tentative inferences.
There was an NHS investigation into the raised rates of deaths and collapses at Countess of Chester Hospital (CoCH) in summer 2015 and summer 2016. It was published in 2017 by the Royal College of Paediatrics and Child Health (RCPCH). The investigation blamed the consultants for the appalling low standard of care, and the terrible situation regarding hygiene. The RCPCH investigators actually wrote that nurse Lucy Letby could not be associated with the events, but that passage was redacted out of the published report for privacy reasons. We know that already, consultants had presented their fears to hospital management. One of them (successful TV doctor and FaceBook influencer dr Ravi Jayaram) was on TV yesterday proudly telling the world that he had been vindicated. Management was inclined not to believe them, and did not act on them, but they certainly came to the ears of the RCPCH. On publication of the report, four consultants had had enough, and went to the police with their suspicions that LL was a murderer.
Thanks to FOI requests and statistical analysis by independent scientists, we now know that the rate of events (deaths and collapses) is just as much raised when Lucy is not on the ward as it is when she is on the ward. A lot of medical information (as well as the state of the drains at CoCH) points to a seasonal virus epidemic.
The elevated rate went back to normal after the hospital was down-graded (no longer accepting high risk patients), and when the drains were rebuilt, and when the senior consultant retired, all of which happened soon after the police investigation started. Incidentally, the rate of still-births and miscarriages show exactly the same pattern.
Lucy must certainly have been a witch in order to kill babies in the womb and even when she is far from the hospital.
Those familiar with miscarriages of justice involving serial killer nurses will be familiar with this police and prosecution tactic. Is it evil or is it just stupid? (cf. Hanlon’s razor). I think it is quite simply “learnt”. Police and prosecution learn what convinces jurors over the years, and that is why the same “mistakes” are made again and again. They work!
Lucy Letby enjoys prosecco at a small new year party with friends and family at her home (?).Lucy Letby appearing via video link at court in Warrington, England, Thursday Nov. 12, 2020, charged with 8 murders and 10 murder attempts (Elizabeth Cook/PA via AP)Lucy Letby as featured on a campaign brochure in the fight to prevent closure of CoCH.
Note: [20 August 2023] This post is incomplete. It needs a prequel: the history of medical investigations into two “unexplained clusters” of deaths at the neonatal ward of the Countess of Chester Hospital. It needs many sequels: statistical evidence; how the cases were selected (the Texas sharpshooter paradox) and the origin of suspicions that a particular nurse might be a serial killer; the post-it note; the alleged insulin poisonings; the trouble with sewage backflow and the evidence of the plumber; the euthanasias. For the medical material, the site to visit is the magnificent https://rexvlucyletby2023.com/.
Lucy Letby, a young nurse, has been tried at Manchester Crown Court for 7 murders and 15 murder attempts on 17 newborn children in the neonatal ward at Countess of Chester Hospital, Chester, UK, in 2015 and 2016.
She was found:– Guilty of 7 counts of murder (against 7 babies) – Guilty of 7 counts of attempted murder (against 6 babies) – Not guilty on 2 counts of attempted murder (against 2 of the 6 babies she *was* found guilty of attempting to murder). No decision was reached on 6 counts of attempted murder against 6 different babies. However, 2 of those 6 she was also found guilty of a different count of attempted murder. [Thanks to the commenter who corrected my numbers.]
The prosecution dropped one further murder charge just before the trial started, on the instruction of the judge. Several groups of alleged murders and murder attempts concern the same child, or twin or triplet siblings. All but one child was born pre-term. Several of them, extremely pre-term.
I’m not saying that I know that Lucy Letby is innocent. As a scientist, I am saying that this case is a major miscarriage of justice. Lucy did not have a fair trial. The similarities with the famous case of Lucia de Berk in the Netherlands are deeply disturbing.
The image below summarizes findings concerning the medical evidence. This was not my research. The graphic was given to me by a person who wishes to remain anonymous, in order to disseminate the research now fully documented on https://rexvlucyletby2023.com/, whose author and owner wishes to remain anonymous. Note that the defence has not called any expert witnesses at all (except for one person: the plumber). Possibly, they had not enough funds for this. Crowd-sourcing might be a smart way of getting the necessary work done for free, to be used at a subsequent appeal. That’s a dangerous tactic, and it seems to me that the defence has already taken a foolish step: they admitted that two babies received unauthorised doses of insulin, and their client was obliged to believe that too.
This blog post started in May 2023 as a first attempt by myself to blog about a case which I have been following for a long time. The information I report here was uncovered by others and is discussed on various internet fora. Links and sources are given below, some lead to yet more excellent sources. Everything here was communicated to the defence, but they declined to use it in court. Maybe they felt their hands were bound by pre-trial agreement between the trial parties as to what evidence would be brought to the attention of the jury, which witnesses, etc.
An extraordinary feature of UK criminal prosecution law is that if exculpatory evidence is in the possession of the defence, but not used in court, then it should not be used at a subsequent appeal, whether by the same defence team or a new one. This might explain why the defence team would not even inform their client of their knowledge of the existence of evidence which exonerated her. Even though, it is also against the law that they did not, as far as we know, disclose evidence which they had which was in her favour. The UK law on criminal court procedure is case law. New judges can always decide to depart from past judges’ rulings.
A very important issue is that the rules of use of expert evidence is that all expert evidence must be introduced before the trial starts. It is strictly forbidden to introduce new expert evidence once the trial is underway.
UK criminal trials are tightly scripted theatre. The jury is of course incommunicado, very close to its verdict, and I do not aim to influence the jury or their verdict. I aim to stimulate discussion of the case in advance of a likely appeal against a likely guilty verdict. I wish to support that small part of the UK population who are deeply concerned that this trial is going to end in an unjustified guilty verdict. Probably it will, but that will not be the end. So much information has come out in the 9 months of the trial so far, that a serious fight on behalf of Lucy Letby is now possible. Public opinion crystallised long ago against Lucy. It can be made fluid again, and maybe it can even be reversed, and this is what must happen if she is to get a fair re-trial.
As a concerned scientist who perceives a miscarriage of justice in the making, I attempted to communicate information not only to the defence but also to the prosecution, to the judge (via the clerk of the court), and to the Director of Public Prosecutions. That was a Kafkaesque experience which I will write about on another occasion. Personally, I tend to think that Lucy is innocent. That was however not my reason for attempting to contact the authorities. As a scientist, it was manifestly clear to me that she was not getting a fair trial. Science was being abused. I tried to communicate with the appropriate authorities. I failed to get any response. Therefore I had to “go public”.
Here is a short list of key medical/scientific issues, originally copied from an early version of the incredible and amazing website https://rexvlucyletby2023.com/, with occasional slight rephrasing and some small, hopefully correct, additions by myself. That site presents full scientific documentation and argumentation for all of the claims made there.
Air embolism cannot be determined by imaging, and can only be determined soon after death, and requires the extraction of air from the circulatory system, and analysis of the composition of the air using gas chromatography.
The coroner found a cause of death in 5 out of 7 of the alleged murder cases. Two of them appeared to be, in part, related to aggressive CPR, two appeared to be due to undiagnosed hypoxic-ischemic encephalopathy and myocarditis, one of the infants received no autopsy, and the other infant was determined to have died due to prematurity. It is highly unusual for the cause of death to be altered years after the fact and using methodology that is not supported by the coroner’s office.
The two claims of insulin poisoning are not supported by the testing conducted, and the infants (who are still alive and well) did not have dangerously low or dangerously high blood glucose levels for any period of time. There are many physiological reasons that could explain their low blood glucose during the whole period. In one of the two cases, assumptions are being made on the basis of one test taken at a single time point, clearly inconsistent with the other medical readings, and contravening the manufacturer’s own instructions for use (see image below). The report detailing the conclusions from that single test violates the code of practice of the forensic science regulator. Moreover, it appears that some numerical error has been made in the necessary calculation, resulting in an outcome which is physiologically impossible (or the person responsible did not know about the so-called “hook effect”). The mismatch between C-peptide and insulin concentration does not prove that the excess insulin found must have been synthetic insulin. There are many other biological explanations for a mismatch. No testing was done to determine the origin of the insulin. Similarly, there are many innocent explanations for the detection of some insulin in a feeding bag.
The air embolism hypothesis is confusing because it fails to explain why some children apparently perished and others did not, and it has not been supported by the minimal necessary measurements.
In at least one case, Lucy is blamed with causing white matter brain injury. This claim is utterly dishonest. The infant who experienced this brain injury was born at 23 weeks gestation, and white matter brain injury is associated with such early births. Further, there is sufficient evidence that demonstrates that enterovirus and parechovirus infection has been linked to white matter brain injury in neonates, resulting in cerebral palsy.
At the time of the collapses and deaths of the infants, enterovirus and parechovirus had been reported in other hospitals. There is a history of outbreaks of these viruses in neonatal wards in hospitals around the world. They especially harm preterm infants who do not yet have a functioning immune system. It is reported that many parents of the infants were concerned that their ward had a virus (as was Lucy) and that Dr Gibbs denied this was so. To date we have seen no evidence to show they did any viral testing, and if they did what the results were.
Then a fact pertaining to my own scientific competence.
Both prosecution and defence were warned long ago about the statistical issues in such cases. Both have responded that they are not going to use any statistics. They are also not using the services of any statistician. Seems the RSS report https://rss.org.uk/news-publication/news-publications/2022/section-group-reports/rss-publishes-report-on-dealing-with-uncertainty-i/ has had the opposite effect to that intended. Amusingly, the same thing happened in the case of Lucia de Berk. At the appeal the prosecution stopped using statistics. She was convicted solely on the grounds of “irrefutable medical scientific evidence”. (Here, I’m quoting from the words both spoken by the judges and written down on the first page of their > 100 page report of the reasons and reasoning which had led to their unshakable conviction that Lucia de Berk was guilty. The longest judge’s summing up in Dutch legal history). I was one of the five coauthors of the RSS report. We were a “task force”, formally commissioned by the “Statistics and the Law” section of the society. I consider it the most important scientific work of my career. It took us two years to put together. We started the work in 2020; we had seen the Lucy Letby trial on the horizon since 2017 when police investigations started and the suspect being investigated was already common knowledge.
The UK does not have anything like that because a jury of ordinary folk are the ones who (legally) determine guilt or innocence. This is a clever device which makes fighting a conviction very difficult; no one can know what arguments the jury had in their mind, no one knows what, if anything, was the key fact that convinced them of guilt. Ordinary people are convinced by what seems to be a smoking gun, they then see all the other evidence through a filter. This is called “confirmation bias”. In the Lucy Letby case, the smoking gun was probably the post-it note, and the insulin then seems to clinch the matter. The prosecution cross-examination convinces those who already believe Lucy is guilty that she moreover is constantly lying. More on all this in later posts, I hope.
Back to the insulin. Here are the instructions on the insulin testing kit used for the trial, taken from this website http://pathlabs.rlbuht.nhs.uk/ccfram.htm, the actual file is http://pathlabs.rlbuht.nhs.uk/insulin.pdf. Notice the warning printed in red. Yes, it was printed in red, that was not something I changed later. (All this is not my discovery; the person who uncovered these facts wishes to remain anonymous).
The toxicological evidence used in the trial violates the code of practice of the UK’s Forensic Science Regulator (see link below). It should have been deemed inadmissible. Instead, the defence has not disputed it, and thereby obliged their own client Lucy to agree that there must have been a killer on the ward. The jury are instructed to believe that two babies were given insulin without authorization, endangering their lives. (The two babies in question are still very much alive, to this day. Probably now at primary school.)
The defence stated to me that they cannot inform Lucy of the alternative analysis of the insulin question. It appears to me that this violates their own code of practice. Do they feel bound by the weird rules of UK’s criminal prosecution practice? Their client, Lucy Letby, is herself essentially merely a piece of evidence, seized by the police from what they believe is a scene of crime. No one may tamper with it during the duration of her own trial, which is lasting 10 months! I think this constitutes an appalling violation of basic human rights. The UK laws on contempt of court are meant to guarantee a fair trial. But in the case of a 10-month trial on 22 charges of murder and attempted murder, they are guaranteeing an unfair trial.
Lucy’s solicitor refused to pass on a friendly personal letter of support to Lucy or to her parents because she had not instructed him to do so. Should one laugh or cry about that excuse? I have the impression that he is not very bright and that he may have been convinced she is guilty. If so, I hope he is changing his mind. In the UK, the solicitor does all the legwork and communication between the client and the defence team. The barrister does the cross-examinations and the court theatrics, but probably never builds up a personal relationship with his client. Lucy has been all this time prison, in pre-trial detention, far from Manchester or Hereford. This might explain the extraordinarily weak defence which has been put up so far. But it might be deliberate.
One must take into account the fact that funding for legal support is meagre. The prosecution has been working on the case for 6 or so years, with unlimited resources. The defence has had a relatively very short time, with very limited resources. Probably the solicitor and the barrister already put in many more hours than they are paid for. There are no funds for expensive scientific witnesses. It is very possible that the defence team well understands that they cannot put up a serious defence during the 9 to 10 months of the trial, but that precisely this time period, with a huge number of revelations being made outside the trial, material for a serious defence during an appeal has been “crowd-sourced”. It seems to me that this mass of high-quality independent scientific work provides plenty of grounds for an appeal, in the case that the jury hands down a guilty verdict.
Do Statistics Prove Accused Nurse Lucy Letby Innocent? https://www.chimpinvestor.com/post/do-statistics-prove-accused-nurse-lucy-letby-innocent This splendid and comprehensive blog post also has a large list of links to reports and data sets. Yet more data analysis can and should be done. This site gives anyone who wants to a quick-start. And after that, two more outstanding posts…
At a pre-publication meeting of stake-holders held to gain feedback on our report, a senior West Midlands police inspector told me “we are not using statistics because they only make people confused”. Lucy’s sollicitor and barrister knew well in advance of our report, were even given names of excellent UK experts whom they could consult, but did not bother to contact one of them. No statistics in our courts please, we are British! Yet the UK has the best applied statisticians and epidemiologists in the world.
Article in “Science” about my work on serial killer nurses
https://www.bbc.co.uk/sounds/play/m001k7vt?partner=uk.co.bbc&origin=share-mobile “The UK’s forensic science used to be considered the gold standard, but no longer. The risk of miscarriages of justice is growing. And now a new Westminster Commission is trying to find out what went wrong. Joshua talks to its co-chair, leading forensic scientist Dr Angela Gallop CBE, and to criminal defence barrister Katy Thorne KC.”
Criminal Procedure Rules and Criminal Practice Directions
New expert evidence cannot be admitted once a trial is in progress
“The courts have indicated that they are prepared to refuse leave to the Defence to call expert evidence where they have failed to comply with CrimPR; for example by serving reports late in the proceedings, which raise new issues (Writtle v DPP [2009] EWHC 236). See also: R v Ensor [2010] 1 Cr. App. R.18 and Reed, Reed & Garmson[2009] EWCA Crim. 2698″. This quote comes from https://www.cps.gov.uk/legal-guidance/expert-evidence. Note, a judge is always allowed to break with precedence. The rule is not actually a permanent rule, it is merely a description of current practice. Current practice evolves when and if a new judge sees fit to break with precedence. Obviously, he would have to come up with good legal reasons why he believes he has to do that. It’s his prerogative, his free choice. That’s the essence of case law, aka common law.
Hierbij een eerste indruk. Er worden nu betrouwbaarheidsintervallen bepaald en men ziet meteen dat de statistische onzekerheid enorm is. Natuurlijk, worden deze berekeningen gebaseerd op statistische veronderstellingen, en die zijn altijd betwistbaar. Maar op zijn minst kunnen ze geinterpreteerd worden op een pure beschrijvend-data manier als een gevoeligheids analyse. Een brede interval laat zien dat als de data een klein beetje anders was, het antwoord totaal anders zou zijn geweest. We weten zo wie zo dat er allerlei foutbronnen zijn; we weten dat de gegevens in de data bestanden van rijksinstellingen heel ver kunnen afliggen van de ervaringen van de burgers; dat ze afhangen van allerlei definities en afspraken die hun oorsprong hebben in bureaucratische administraties.
Een belangrijke resultaat is het plaatje hieronder, waarbij statistische onzekerheidsmarges toegevoegd zijn aan een plaatje uit de eerste (en omstreden) CBS rapport. Figuur 6.1.1.
Ik heb de “kleine letters” en de “nog kleinere kleine letters” meegenomen, niet om te lezen, maar om te laten zien dat er een hele technisch verhaal bijhoort.
De eerste indruk is dat het lijntje in het midden ongeveer plat is. Dus: de nare ingreep (gedupeerd zijn) in jaar “nul” geen sterke effect heeft. Men ziet over meerdere jaren een lichte toename bij dezelfde 4000 gezinnen van maatregelen van jeugdbescherming wat, zo te zien, beste toevallig had kunnen zijn. De hypothese van “geen impact” kan niet verworpen worden op grond van deze cijfers.
Maar, dat is niet de enige mogelijke uitleg van het plaatje, en die is net zo min te verwerpen. Dat hobbeltje in de grafiek zou ook “echt” kunnen zijn, en bovendien veroorzaakt door de klap wat de belastingdienst in “jaar nul” uitdeelde. Het ziet eruit als een stijging van een half procent per jaar, over meerdere jaren. De meest aannemelijke schatting is dat 20 tot 30 (of zelfs meer) echte dubbele slachtoffers zijn; dubbele slachtoffers in de zin dat gedupeerd zijn door de uitkeringsschandaal werkelijk leidde tot een uithuisplaatsing wat anders niet zou zijn gebeurd.
Het echte effect is gedempt en uitgesmeerd door alle tekortkomingen van het onderzoek. De conclusie moet zijn: het zijn zeker tientallen en mogelijk zelfs honderd.
Overigens, zou ik graag een keer een extra cijfer willen hebben waardoor ik de statistische onzekerheid in het verschil in hoogte van deze twee waardes (blaue en groen) zou kunnen evalueren.
Er zijn ruwweg 4000 gedupeerden en die zijn gepaard één op één met vergelijkbare niet-gedupeerden. We hebben feitelijk te maken met rond de 4000 matched pairs. Het CBS weet van elk lid van elk paar of een jeugdbescherming actie plaatsvond. We hebben feitelijk 4000 waarnemingen van paren, elk waarvan een van de vier waardes kan aannemen (0, 0), (0, 1), (1, 0), (1, 1); noem deze twee gevallen (x, y). Een “1” betekent uit een huisplaatsing (of iets dergelijks), een “0” betekent geen uithuisplaatsing. We zijn geinteresseerd in de gemiddelde van de x‘en minus de gemiddelde van de y‘s. Dat is hetzelfde als de gemiddelde van alle (x – y) waarden; elk ervan is gelijk aan –1, 0, of +1. Ik zou graag het 2×2 tabel willen zien van aantallen van elk van de vier mogelijke gesamenlijke uitkomsten (x, y). Ik zou de standaard afwijking willen uitrekenen van de (x – y) waarden. Dit zou ons inzicht geven in de mate van success van de matching: als het goed is, zouden we een positieve correlatie zien tussen de uitkomsten van de twee groepen. Een correlatie van +1 zou impliceren dat de uitkomst volledig bepaald is door de matching variabelen, dat zou betekenen: gedupeerd zijn maakte werkelijk niks uit. Kom’ns op, CBS!
Professor Gill helped exonerate Lucia de B., and is now making mincemeat of the CBS report on benefits affair
Top statistician Richard Gill cracks down on the research conducted by Statistics Netherlands (CBS) into custodial placements of children of victims in the benefits affair. ‘CBS should never have come to the conclusion that this group of parents was not hit harder than other parents.’
Carla van der Wal 26-01-23, 06:00 Last update: 08:10
Emeritus professor Richard Gill would prefer to pick edible mushrooms in the woods and spend time with his grandchildren. Nevertheless, the top statistician in the Netherlands, who previously helped to exonerate the unjustly convicted Lucia de B, is now firmly committed to the benefits affair.
CBS should never have started the investigation into the custodial placement of children of victims in the benefits affair, says Gill. “And the conclusion that this group of parents has not been hit harder than other parents, CBS should never have drawn. It left many people thinking: only the tax authorities have failed, but fortunately there is nothing wrong with youth care. So all the fuss about ‘state kidnappings’ was unnecessary.”
After Statistics Netherlands calculated how many children of benefit parents were placed out of home (in the end it turned out to be 2090), it seemed that victims in the affair lost their children more often than similar parents who were not victims. The results were presented on November 1 last year, which Gill now denounces.
Gill is emeritus professor of mathematical statistics at Leiden University and in the past was an advisor to the methodology department of Statistics Netherlands. In the case of Lucia de B. he showed that calculations that would show that De B. had more deaths in her services were incorrect.
CBS abuses
There is a special reason that Gill is now getting stuck in the benefits affair – but more on that later. First about the CBS report. Gill states that Statistics Netherlands is not equipped for this type of research and points out that after two research methods were dropped, only one ‘not ideal, but only option’ remained. He also thinks, among other things, that the more severely affected victims in the benefits affair should be the focus of the investigation. He emphasizes that relatively mildly affected families most likely had to deal with much less drastic consequences. CBS itself also says that it likes to use information about the degree of duping, but that there was none.
CBS also acknowledges some criticisms. “CBS itself has mentioned a number of comments to the report. There seems to be a misunderstanding on one point,” said a spokesperson, who also said that CBS still fully supports the conclusions. CBS will soon be discussing the methodology used with Gill, but in any case CBS sees itself as the right party to carry out the study. “CBS has the task of providing insight into social issues with reliable statistical information and data and has the necessary expertise and techniques. In this case there was a clear social need for statistical insight.”
Gill thinks otherwise and thinks it’s important to raise this. Because he is awakened by injustice. That was also a reason to offer his help when questions arose about the conviction of Lucia de B., who can simply be called Lucia de Berk again since her acquittal. In 2003 she was sentenced to life imprisonment.
Out-of-home placement
With the acquittal in 2010, Gill became not only a top statistician, but also a beacon of hope for people who experienced injustice. And José Booij, a mother of a child placed in care, contacted him many years ago.
Somewhere in Gill’s house in Apeldoorn there is still a box with papers from José. It contains diaries, newspaper clippings and diplomas of hers. She was a little different from other people. A doctor who fell for women, fled the Randstad and settled in Drenthe. There she became pregnant and had a baby. And she had a neighbour with whom she had a disagreement. “That neighbour had made all kinds of reports about José to the local police, said that something terrible would happen to the child.” After six weeks, José’s daughter was removed from home.
State kidnapping
“What happened to José at the time, I also call that a state kidnapping, just as the custodial placements among victims of the benefits affair are now called.” The woman continued to fight to get her child back. But gradually that fight drove her insane. She lost her job, she lost her home. She fled abroad. “Despite a court ruling that the child had to be returned to José, that did not happen. José eventually derailed. I now know that she has left information with more people in the Netherlands to ensure that it is available to her daughter when she is ready. But I can’t find José anymore. I heard she was seen in the south of the Netherlands after escaping from a psychiatric clinic in England.”
And meanwhile he keeps that box. And Gill thinks of José, when he considers the investigation by the Central Bureau of Statistics into custodial placements of children of victims in the benefits affair. Gill makes mincemeat of it. “The only thing CBS can say is that the results suggest that the differences between the two groups that have been compared are quite small. There should be a lot more caution, and yet in the summary you see bold summaries, such as: ‘Being duped does not increase the likelihood of child protection measures’. I suspect that CBS was put under pressure to conduct this study, or wanted to justify its existence. Perhaps there is an urge to be of service.”
Time for justice
Now is the time to put that right, Gill thinks. Research needs to be done to find out what’s really going on. “I had actually hoped that younger colleagues would have stood up by now, who would take up such matters.” But as long as that doesn’t happen, he’ll do it himself. Maybe it’s in his genes. It was Gill’s mother – he was born in England – who helped crack the enigma code used by the Germans to communicate during World War II. Gill wasn’t surprised when he found out. He already suspected that his excellent mind was inherited not only from his father, but also from his mother.
Love
Yet in the end it was his wife – the love of whom led him settle in the Netherlands – who put him on this track. She pointed Gill to Lucia de Berk’s case and encouraged him to get to work. She may have regretted that. For example, when Gill threatened to burn his Dutch passport during a broadcast of The World Keeps on Turning Round (“De wereld draait door”) if the De Berk case was not reviewed. “She said, ‘You can’t say things like that!'”
In fact, he would like to enjoy his retirement with her now – he has been out of paid work for six years now. Then he would spend his days in the woods looking for edible mushrooms. And spend a lot of time with his grandchildren. But now his calculations also help exonerate other nurses. Last year, Daniela Poggiali was released in Italy after Gill interfered with the case together with an Italian colleague. There are still things waiting for him in England.
And so the benefits affair is here in the Netherlands, which, as far as Gill is concerned, needs more in-depth, thorough research to find out exactly what caused the custodial placements. “That is why I ended up with Pieter Omtzigt and Princess Laurentien, who are also involved in the benefits affair.” Among the people who express themselves diplomatically, he wants to be the bad cop, the man who shakes things up, as he did when he threatened to set his passport on fire. But at the same time, he also hopes that a young statistician will emerge who is prepared to take over the torch.
CBS provided this site with an extensive explanation in response to Gill’s criticism. It recognizes the complexity of this type of research, but sees itself as the appropriate body to carry out that research. An appointment to speak with Gill has already been scheduled. “CBS always tries to explain as clearly and transparently as possible in its reports what has been investigated, how it was done and what the results are.”
Statistics Netherlands also points to nuances in the text of the report, for example after the sentence above a piece of text: ‘Being duped does not increase the chance of child protection measures’. “On an individual level, there may be a relationship between duping and youth protection, which is stated in several places in the report.” Even if ‘on average no evidence is found for a relationship between duping and youth protection’, as Statistics Netherlands notes.
Statistics Netherlands fully supports the research and the conclusions as stated in the report. It is pointed out, however, that there are still opportunities for follow-up research, as has also been indicated by Statistics Netherlands.
Hoogleraar Gill hielp bij vrijpleiten Lucia de B., en maakt nu gehakt van CBS-rapport toeslagenaffaire
Topstatisticus Richard Gill kraakt het onderzoek dat het Centraal Bureau voor de Statistiek (CBS) uitvoerde naar uithuisplaatsingen van kinderen van gedupeerden in de toeslagenaffaire. ‘De conclusie dat deze groep ouders niet harder is geraakt dan andere ouders, had het CBS nooit mogen trekken.’
Carla van der Wal 26-01-23, 06:00 Laatste update: 08:10
Het liefste zou emeritus hoogleraar Richard Gill eetbare paddenstoelen plukken in het bos, en tijd doorbrengen met zijn kleinkinderen. Toch bijt de topstatisticus van Nederland, die eerder hielp bij het vrijpleiten van de onterecht veroordeelde Lucia de B, zich nu vast in de toeslagenaffaire.
Het CBS had nooit aan het onderzoek naar de uithuisplaatsing van kinderen van slachtoffers in de toeslagenaffaire moeten beginnen, zegt Gill. ,,En de conclusie dat deze groep ouders niet harder is geraakt dan andere ouders, had het CBS nooit mogen trekken. Die liet velen denken: alleen de belastingdienst heeft gefaald, maar er is gelukkig niets mis met jeugdzorg. Al die ophef over ‘staatsontvoeringen’ was dus onnodig.’’
Nadat het CBS becijferde hoeveel kinderen van toeslagenouders uit huis werden geplaatst (uiteindelijk bleken het er 2090), leek het of gedupeerden in de affaire vaker hun kinderen kwijtraakten dan soortgelijke ouders die geen slachtoffer waren. Op 1 november vorig jaar werden de resultaten gepresenteerd, die Gill nu hekelt.
Gill is emeritus hoogleraar mathematische statistiek aan de universiteit van Leiden en was in het verleden adviseur bij de afdeling methodologie van het CBS. In de zaak van Lucia de B. liet hij zien dat berekeningen die zouden aantonen dat De B. vaker sterfgevallen in haar diensten had, niet klopten.
Misstanden CBS
Dat Gill zich nu vastbijt in de toeslagenaffaire heeft een bijzondere reden – maar daarover later meer. Eerst nog over het rapport van het CBS. Gill stelt dat het CBS niet is ingericht op dit type onderzoek en wijst erop dat nadat twee onderzoeksmethodes afvielen slechts één ‘niet ideale, maar enige optie’ overbleef. Ook vindt hij onder meer dat zwaarder getroffen gedupeerden in de toeslagenaffaire centraal zouden moeten staan bij het onderzoek. Hij benadrukt dat relatief licht geraakte gezinnen hoogstwaarschijnlijk met veel minder ingrijpende gevolgen te maken hebben gehad. Het CBS zegt overigens zelf ook dat het graag informatie over de mate van gedupeerdheid gebruikt, maar dat die er niet was.
Het CBS erkent ook sommige punten van kritiek. ,,Een aantal heeft het CBS zelf als kanttekening genoemd bij het rapport. Op een enkel punt lijkt sprake van een misverstand’’, aldus een woordvoerder, die ook zegt dat het CBS nog volledig achter de conclusies staat. Over de gebruikte methodologie gaat het CBS binnenkort met Gill in gesprek, maar het CBS ziet zich in elk geval wél als de juiste partij om het onderzoek uit te voeren. ,,Het CBS heeft als taak om met betrouwbare statistische informatie en data inzicht te geven in maatschappelijke vraagstukken en beschikt over de nodige expertise en technieken. In dit geval was een duidelijke maatschappelijke behoefte aan statistisch inzicht.’’
Gill denkt daar anders over en vindt het belangrijk dat aan te kaarten. Want hij ligt wakker van onrecht. Dat was ook reden om zijn hulp aan te bieden toen er vragen rezen over de veroordeling van Lucia de B., die sinds haar vrijspraak gewoon weer Lucia de Berk genoemd kan worden. In 2003 werd ze veroordeeld tot een levenslange gevangenisstraf.
Uithuisplaatsing
Door de vrijspraak in 2010 werd Gill naast een topstatisticus ook een baken van hoop voor mensen die onrecht ervaarden. En nam José Booij, een moeder van een uit huis geplaatst kind, vele jaren geleden contact met hem op.
Ergens in het huis van Gill in Apeldoorn staat nog een doos met papieren van José. Erin zitten dagboeken, krantenknipsels en diploma’s van haar. Ze was een beetje anders dan andere mensen. Een jurist die op vrouwen viel, de Randstad ontvluchtte en neerstreek in Drenthe. Daar werd ze zwanger, kreeg ze een kindje. En had ze een buurvrouw, met wie ze onenigheid had. ,,Die buurvrouw had allerlei meldingen over José gedaan bij de lokale politie, had gezegd dat met het kindje iets vreselijks zou gebeuren.” Na zes weken werd Josés dochtertje uit huis geplaatst.
Staatsontvoering
,,Wat José indertijd is overkomen, dat noem ik ook een staatsontvoering, net zoals de uithuisplaatsingen onder slachtoffers van de toeslagenaffaire nu worden genoemd.” De vrouw bleef vechten om haar kind terug te krijgen. Maar gaandeweg dreef dat gevecht haar tot waanzin. Ze raakte haar werk kwijt, ze raakte haar huis kwijt. Ze vluchtte naar het buitenland. ,,Ondanks een oordeel van de rechter, dat het kind terug moest naar José, gebeurde dat niet. José is uiteindelijk ontspoord. Inmiddels weet ik dat ze bij meer mensen in Nederland informatie heeft achtergelaten, om te zorgen dat die beschikbaar is voor haar dochter, als die eraan toe is. Maar José heb ik niet meer kunnen vinden. Ik heb gehoord dat ze nog is gezien in het zuiden van Nederland, nadat ze was ontsnapt uit een psychiatrische kliniek in Engeland.”
En ondertussen bewaart hij dus die doos. En denkt Gill aan José, als hij zich buigt over het onderzoek van het Centraal Bureau voor de Statistiek, naar uithuisplaatsingen van kinderen van slachtoffers in de toeslagenaffaire. Gill maakt er gehakt van. ,,Het enige wat het CBS kan zeggen, is dat de uitkomsten suggereren dat de verschillen tussen de twee groepen die zijn vergeleken vrij klein zijn. Er zou veel meer voorzichtigheid moeten zijn, en toch zie je in de samenvatting in vetgedrukte letters stellige samenvattingen, zoals: ‘Gedupeerdheid verhoogt de kans op kinderbeschermingsmaatregelen niet’. Ik vermoed dat het CBS onder druk is gezet om dit onderzoek te doen, of zijn bestaansrecht heeft willen verantwoorden. Wellicht is er sprake van een drang om dienstbaar te zijn.”
Tijd voor rechtvaardigheid
Nu is het tijd om dat recht te zetten, vindt Gill. Er moet onderzoek worden gedaan, om te kijken hoe het echt zit. ,,Ik had eigenlijk gehoopt dat er inmiddels jongere collega’s zouden zijn opgestaan, die dit soort zaken op zouden pakken.” Maar zolang dat niet gebeurt, doet hij het zelf wel. Misschien zit het wel in zijn genen. Het was Gills moeder – hij werd geboren in Engeland – die tijdens de Tweede Wereldoorlog meewerkte aan het kraken van de enigmacode, die door de Duitsers werd gebruikt om te communiceren. Gill verraste het niet, toen hij erachter kwam. Hij had al zo’n vermoeden dat zijn excellente verstand niet alleen een erfenis van zijn vader, maar ook zijn moeder was.
De liefde
Toch was het uiteindelijk zijn vrouw – de liefde zorgde dat hij in Nederland neerstreek – die hem op dit spoor heeft gezet. Zij wees Gill op de zaak van Lucia de Berk en stimuleerde hem ermee aan de slag te gaan. Misschien heeft ze dat wel eens betreurd. Bijvoorbeeld toen Gill tijdens opnames van De wereld draait door dreigde zijn Nederlandse paspoort te verbranden, als de zaak De Berk niet werd herzien. ,,Ze zei: dat kan je toch niet doen?”
Eigenlijk zou hij nu met haar van zijn pensioen willen genieten – hij is inmiddels zes jaar gestopt met zijn betaalde werk. Dan zou hij zijn dagen vullen in het bos, zoekend naar eetbare paddenstoelen. En veel tijd doorbrengen met zijn kleinkinderen. Maar nu helpen zijn berekeningen ook bij het vrijpleiten van andere verpleegkundigen. Vorig jaar werd Daniela Poggiali nog vrijgelaten in Italië, nadat Gill zich samen met een Italiaanse collega met de zaak bemoeide. In Engeland zijn nog zaken die op hem wachten.
En de toeslagenaffaire is er hier in Nederland dus, waar wat Gill betreft diepgravender, gedegen onderzoek naar moet komen, om uit te zoeken wat nu precies de uithuisplaatsingen veroorzaakte. ,,Ik ben daarom terechtgekomen bij Pieter Omtzigt en prinses Laurentien, die zich ook met de toeslagenaffaire bezighouden.” Tussen de mensen die zich diplomatiek uiten, wil hij best de bad cop zijn, de man die de boel opschudt, zoals hij deed toen hij dreigde zijn paspoort in de fik te steken. Maar tegelijkertijd hoopt hij toch vooral ook dat er een jonge statisticus opstaat, die bereid is de fakkel over te nemen.
Het CBS gaf deze site een uitgebreide toelichting, naar aanleiding van de kritiek van Gill. Het erkent de complexiteit van dit soort onderzoek, maar ziet zichzelf wél als aangewezen instantie om dat onderzoek uit te voeren. De afspraak om met Gill te spreken is al ingepland. ,,Het CBS tracht in de rapporten altijd zo duidelijk en transparant mogelijk uit te leggen wat onderzocht is, hoe dat is gedaan en wat de uitkomsten zijn.”
Ook wijst het CBS op nuanceringen in de tekst van het rapport, bijvoorbeeld na de zin boven een stuk tekst: ‘Gedupeerdheid verhoogt de kans op kinderbeschermingsmaatregelingen niet’. ,,Er kan op individueel niveau wél een relatie tussen dupering en jeugdbescherming zijn, dat staat op meerdere plekken in het rapport vermeld.” Ook als er ‘gemiddeld genomen geen bewijs gevonden wordt voor een relatie tussen dupering en jeugdbescherming’, zoals het CBS constateert.
Het CBS staat volledig achter het onderzoek en de conclusies zoals die in het rapport vermeld staan. Wel wordt erop gewezen dat er nog mogelijkheden zijn voor vervolgonderzoek, dat heeft het CBS ook aangegeven.
Hieronder volgt een poging (20 januari 2023, ‘s ochtends) om het kern van het verhaal op te schrijven in 500 woorden en Jip en Janneke taal. Het lukte niet.
Heeft het CBS de waarheid in pacht?
Velen werden wakker geschud door carabetier Peter Pannekoek’s woorden “1115 staatsontvoeringen”. Maar ze kunnen weer in slaap gesust zijn door het CBS rapport “Jeugdbescherming en de toeslagenaffaire – Kwantitatief onderzoek naar kinderbeschermingsmaatregelen bij kinderen van gedupeerden van de toeslagenaffaire”. Een van de belangrijkste conclusies (samenvatting, eerste bladzijde) luidt
“Gedupeerdheid verhoogt de kans op kinderbeschermingsmaatregelen niet“.
Dat is een krachtige uitspraak. Geen enkel relativering, geen “kleine letters”. Geen melding dat het een uitspraak is die alleen gemaakt kan worden onder een hele reeks veronderstellingen. Helaas, een hele reeks veronderstellingen waarvan velen pertinent onwaar zijn.
Mijn antwoord: misschien geen 1115, maar misschien wel: 115
Nu munt het CBS uit in het doen van beschrijvend statistiek, wat ook hun wettelijke opdracht is. Ze dienen neutraal de feiten te ontsluiten en weer te geven die politiek en bestuur en burgers nodig hebben. Waar het CBS minder expertise in huis heeft, omdat het ook beslist niet tot hun taak behoort, is in het ontwarren van oorzaak en gevolg. Dat noemen we tegenwoordig “Causaliteit” en het is een uiterst actueel, belangrijk, subtiel, en complex onderwerp binnen het wetenschappelijk onderzoek; explosief gegroeid sinds Judea Pearls boek “Causality” uit 2000. Kan je causaliteit concluderen door het waarnemen van correlatie of associatie?
Voorbeeld. Lucia de B maakte vreselijk veel incidenten mee in haar diensten. Veel meer dan men zou hebben verwacht en dat leidde ook tot levenslange gevangenisstraf voor seriële moord. Pas later werd duidelijk dat haar aanwezigheid juist de reden was dat medisch onderzoekers bepaalde gebeurtenissen als incidenten karakteriseerden!
Maar kan geen associatie ook op causaliteit duiden? Jawel! Statistieken kunnen misleiden. Een aansprekend visuele representatie van statistieken des te meer. Mijn oog werd getrokken door Figuur 6.1.2 in het CBS rapport waarin we drie vrolijk gekleurde balkjes zijn, die de percentages 1%, 4% en 4% dienen te representeren. Zie je wel! De percentage uithuisplaatsingen bij de gedupeerden is exact wat je zou hebben verwacht, als al die gezinnen helemaal niet gedupeerd waren geweest!
Ik zou zeggen, dat kan geen toeval zijn. Na studie van het onderzoeksprotocol inclusief de vele door de team hanteerde algoritmes, wordt ook duidelijk dat het geen toeval is. Door de onderzoekskeuzes die het onderzoeksteam zich gedwongen voelde te maken is het verschil in uithuisplaatsingskans tussen “vergelijkbare” wel en niet gedupeerden systematisch verkleind. Het verschil is dus groter dan het lijkt (het lijkt nul te zijn, maar dat is het beslist niet). De juiste conclusie van het onderzoek had moeten zijn, ten eerste, dat er zeker tientallen uithuisplaatsingen “extra” plaatsvonden vanwege de affaire en mogelijk honderd (of zelfs een paar honderd). Een tweede conclusie had moeten zijn dat deze gedurfde pilot studie bewezen heeft dat een totaal ander onderzoeksopzet nodig is oude gestelde vraag te beantwoorden. Mogelijk, iets in de trant van het eerder verworpen onderzoeksvoorstel van Prof. Bart Tromp van de Universiteit Groningen. Overigens, is het nooit nodig om alle dossiers van de hele geschiedenis van alle gedupeerden door te pluizen. Door slim een aselecte steekproef in een verstandig gekozen deelpopulatie te nemen, kan men zich beperken tot het goed uitzoeken van relatief weinig gevallen.
Goede “Data Science” is onmogelijk zonder grote expertise te combineren uit drie gebieden tegelijkertijd: 1) algoritmes en computer mogelijkheden; 2) kansrekening en inferientiele statistiek (dwz het kwantificeren van de onzekerheid in de gevonden resultaten); 3) (last but not least!) vakspecifieke kennis van het beoogde toepassingsgebied; in dit geval psychologie, recht, bestuur.
Ik denk aan een statistische simulatie om mijn punt te illustreren. Die twee getallen “4%” hebben foutbalken nodig van ongeveer +/- 1%. Lastig omdat ik rekening moet houden met de correlatie binnen de paren. We kunnen alleen maar raden hoe groot het is. Dus: meerdere simulaties met verschillende gissingen.
Richard Gill is emeritus professor of mathematical statistics at Leiden University. He is a member of the KNAW and former chairman of the Netherlands Statistical Society (VVS-OR)
=========================================
Mr. Pieter Omtzigt has asked me to give my expert opinion on the CBS report that examines whether the number of child care placements of children by Dutch child protection authorities increased because their families had fallen victim to the child benefit scandal in the Netherlands.
The current note is preliminary and I intend to refine it further. My purpose is to stimulate discussion among relevant professionals of the methodology used by the CBS in this particular case.Feedback, please!
The report gives a clear (and short) account of creative statistical analysis of much complexity. The sophisticated nature of the analysis techniques, the urgency of the question, and the need to communicate the results to a general audience probably led to important “fine print” about the reliability of the results being omitted. The authors seem to me to be too confident in their findings.
Numerous choices had to be made by the CBS team to answer the research questions. Many preferable options are excluded due to data availability and confidentiality. Changing one of the many steps in the analysis through changes in criteria or methodology could lead to wildly different answers. The actual finding of two nearly equal percentages (both close to 4%) in the two groups of families is, in my opinion, “too good to be true”. It’s a fluke. Its striking character may have encouraged the authors to formulate their conclusions much more strongly than they are entitled to.
In this regard, I found it significant that the authors note that the datasets are so large that statistical uncertainty is unimportant. But this is simply not true. After constructing an artificial control group, they have two groups of size (in round numbers) 4000, and 4% of cases in each group, i.e. about 160. According to a rule of thumb calculation (Poisson variation), the statistical variation in those two numbers have a standard deviation of about the square root of 160, so about 12.5. That means that one of those numbers (160) could easily happen to have twice the standard deviation, which is about 25. The conclusion that the benefits scandal did not lead to more children being removed from home than without it would have been the case, certainly cannot be drawn . Taking into account the statistical sampling error, it is quite possible that the control group (those not afflicted by the benefits scandal) would have been 50 less. In that case, the study group experienced 50 more than they would have done, had they not been victims of the benefits scandal.
To make the numbers easier still, suppose there was an error of 40 cases too few in the light blue bar standing for 4%. 40 out of 4000 is 1 out of 100, 1%. Change the light blue bar from height 4% to height 3% and they don’t look the same at all!
But this is already without taking into account possible systematic errors. The statistical techniques used are advanced and model-based. This means that they depend on the validity of many particular assumptions about the form and nature of the relationships between the variables included in the analysis (using “logistic regression”). The methodology uses these assumptions for its convenience and power (more assumptions mean stronger conclusions, but threatens “garbage in, garbage out”). Logistic regression is such a popular tool in so many applied fields because the model is so simple: the results are so easy to interpret, the calculation can often be left to the computer without user intervention. But there’s no reason why the model should be exactly true; one can only hope that it is a useful approximation. Whether it is useful depends on the task for which it is used. The current analysis uses logistic regression for purposes for which it was not designed.
The assumptions of the standard model of logistic regression are certainly not exactly met. It is not clear whether the researchers tested for failure of the assumptions (for example, by looking for interaction effects – violation of additivity). The danger is that the failure of the assumptions can lead to systematic bias in the results, bias that affects the synthetic (“matched”) control group. The central assumption in logistic regression is the additivity of effects of various factors on the log-odds scale (“odds” means probability divided by complementary probability; log means logarithm). This could be true to a first rough approximation, but it is certainly not exactly true. “All models are wrong, but some are useful”.
A good practice is to build models by analyzing a first data set and then evaluating the final chosen model on an independently collected second data set. In this study, not one but numerous models were tested. The researchers seem to have chosen from countless possibilities through subjective assessment of plausibility and effectiveness. This is fine in an exploratory analysis. But the findings of such an exploration must be tested against new data (and there is no new data).
The end result was a procedure to choose “nearest neighbour matches” with respect to a number of observed characteristics of the cases examined. Errors in the logistic regression used to choose matched controls can systematically bias the control group.
Further big questions concern the actual selection of cases and controls at the beginning of the analysis. Not all families affected by the benefits scandal had to pay back a huge amount of subsidy. Mixing the hard-hit and the weak-hit dilutes the effect of the scandal, both in magnitude and accuracy, the latter because maller samples lead to relatively less accurate determination of effect size.
Another problem is that the pre-selection control population (families in general from which a child was removed) also contains victims of the benefit scandal (the study population). That brings the two groups closer together, even more so after the familywise one-on-one matching process, which of course selectively finds matches among the subpopulation most likely to be affected by the benefits scandal.
Richard Gill is emeritus hoogleraar wiskundige statistiek aan de Universiteit Leiden. Hij is lid van de KNAW en voormalig voorzitter van het Nederlands Statistisch Genootschap (VVS-OR)
===========================================
De heer Pieter Omtzigt heeft mij gevraagd om mijn deskundige mening te geven over het CBS-rapport waarin wordt onderzocht of het aantal uithuisplaatsingen van kinderen door de Nederlandse kinderbescherming is toegenomen doordat hun families het slachtoffer zijn geworden van het kinderbijslagschandaal in Nederland. De huidige nota is voorlopig en ik ben van plan deze verder te verfijnen. Commentaar, kritiek, is welkom.
Het rapport geeft een duidelijk (en kort) verslag van creatieve statistische analyses van enige complexiteit. Het geavanceerde karakter van de analysetechnieken, de urgentie van de vraag en de noodzaak om de resultaten aan een algemeen publiek te communiceren, hebben er waarschijnlijk toe geleid dat belangrijke “kleine lettertjes” over de betrouwbaarheid van de resultaten werden weggelaten. De auteurs lijken mij te veel vertrouwen te hebben in hun bevindingen.
Om de onderzoeksvragen te beantwoorden moesten er door het CBS-team tal van keuzes worden gemaakt. Veel voorkeursopties zijn uitgesloten vanwege beschikbaarheid van gegevens en vertrouwelijkheid. Het wijzigen van een van de vele stappen in de analyse door wijzigingen in criteria of methodologie kan tot enorm verschillende antwoorden leiden. De daadwerkelijke bevinding van twee bijna gelijke percentages (beide dicht bij de 4%) in de twee groepen gezinnen is naar mijn mening “te mooi om waar te zijn”. Het is een toevalstreffer. Het opvallende karakter ervan heeft de auteurs misschien aangemoedigd om hun conclusies veel sterker te formuleren dan waar ze recht op hebben.
In dit verband vond ik het veelzeggend dat de auteurs opmerken dat de datasets zo groot zijn dat statistische onzekerheid onbelangrijk is. Maar dit is gewoon niet waar. Na constructie van een kunstmatige controlegroep hebben ze twee groepen van omvang (in ronde getallen) 4000, en 4% van de gevallen in elke groep, dat wil zeggen ongeveer 160. Volgens een vuistregelberekening (Poisson-variatie) heeft de statistische variatie in die twee getallen een standaarddeviatie van ongeveer de vierkantswortel van 160, dus ongeveer 12,5. Dat betekent dat elk van die getallen (160) toevallig gemakkelijk twee keer de standaarddeviatie kan hebben, namelijk ongeveer 25.
Rekening houdend met de statistische steekproeffout, is het heel goed mogelijk dat de controlegroep (degenen die niet getroffen zijn door het uitkeringsschandaal) 50 minder zou zijn geweest. In dat geval maakte de onderzoeksgroep er 50 meer mee dan ze zouden hebben gedaan als ze geen slachtoffer waren geweest van het uitkeringsschandaal.
Om de cijfers nog makkelijker te maken, stel dat er een fout was van 40 gevallen te weinig in de lichtblauwe balk, wat staat voor 4%. 40 van de 4000 is 1 van de 100, 1%. Verander de lichtblauwe balk van hoogte 4% naar hoogte 3% en ze zien er helemaal niet hetzelfde uit!
Maar dit is al zonder rekening te houden met mogelijke systematische fouten. De gebruikte statistische technieken zijn geavanceerd en modelmatig. Dit betekent dat ze afhankelijk zijn van de validiteit van tal van bijzondere aannames over vorm en aard van de relaties tussen de variabelen die in de analyse zijn opgenomen (met behulp van “logistische regressie”). De methodologie gebruikt deze aannames vanwege zijn gemak (“convenience”) en kracht (meer aannames betekent sterkere conclusies, maar dan dreigt “garbage in, garbage out”). Logistische regressie is zo’n populair hulpmiddel in zoveel toegepaste gebieden omdat het model zo eenvoudig is: de resultaten zijn zo gemakkelijk te interpreteren, de berekening kan vaak zonder tussenkomst van de gebruiker aan de computer worden overgelaten. Maar er is geen enkele reden waarom het model precies waar zou moeten zijn; men kan alleen maar hopen dat het een bruikbare benadering is. Of het nuttig is, hangt af van de taak waarvoor men het gebruikt. De huidige analyse gebruikt logistische regressie voor doeleinden waarvoor het niet is ontworpen.
Aan de aannames van het standaardmodel wordt zeker niet precies voldaan. Het is niet duidelijk of de onderzoekers hebben getest op het falen van de aannames (bijvoorbeeld door te zoeken naar interactie-effecten – schending van additiviteit). Het gevaar is dat het falen van de aannames kan leiden tot systematische vertekening in de resultaten, vertekening die van invloed is op de synthetische (“gematchte”) controlegroep. De centrale aanname bij logistische regressie is de additiviteit van effecten van verschillende factoren op de schaal van log-odds (“odds” betekent: kans gedeeld door complementaire kans; log betekent logarithme). Dit zou waar kunnen zijn bij een eerste ruwe benadering, maar het is zeker niet exact waar. “Alle modellen zijn verkeerd, maar sommige zijn nuttig”.
Een goede praktijk is om modellen te bouwen door een eerste dataset te analyseren en vervolgens het uiteindelijk gekozen model te evalueren op een onafhankelijk verzamelde tweede dataset. In deze studie werden niet één maar tal van modellen uitgeprobeerd. De onderzoekers lijken te hebben gekozen uit talloze mogelijkheden door subjectieve beoordeling van plausibiliteit en effectiviteit. Dit is prima in een verkennende analyse. Maar de bevindingen van zo’n verkenning moeten worden getoetst aan nieuwe gegevens (en er zijn geen nieuwe gegevens).
Het resultaat was een procedure om “naaste buur overeenkomsten” te kiezen met betrekking tot een aantal waargenomen kenmerken van de onderzochte gevallen. Fouten in de logistische regressie die wordt gebruikt om overeenkomende controles te kiezen, kunnen de controlegroep systematisch vertekenen.
Verdere vragen gaan over de daadwerkelijke selectie van cases en controles aan het begin van de analyse. Niet alle door het uitkeringsschandaal getroffen gezinnen moesten een enorm bedrag aan subsidie terugbetalen. Door de hard getroffen en de zwak getroffen te mengen, wordt het effect van het schandaal afgezwakt, zowel in grote als in nauwkeurigheid.
Een ander probleem is dat de pre-selectie controlepopulatie (gezinnen in het algemeen waarvan een kind werd weggehaald) ook slachtoffers bevat van het uitkeringsschandaal (de studiepopulatie). Dat brengt de twee groepen dichter bij elkaar, en dat nog meer na het matchingsproces, dat uiteraard selectief matches vindt onder de subpopulatie die het meest waarschijnlijk door het uitkeringsschandaal is getroffen.