This is a first attempt to summarise my claims in 500 words and simple language. It didn’t succeed.
Does CBS have direct access to the truth?
Many were shaken up by carabetier Peter Pannekoek ‘s words “1115 state kidnappings”. But they may have been lulled back to sleep by the CBS report “Youth protection and the benefits affair – Quantitative research into child protection measures in children of victims of the benefits affair”. One of the main conclusions (summary, first page) reads
“Being a victim of the benefits scandal does not increase the likelihood of child protection measures“.
That’s a powerful statement. No relativization whatsoever, no “small print”. No mention of it being a statement that can only be made under a slew of assumptions. Alas, a slew of assumptions many of which are patently untrue.
My answer: Maybe not 1115, but could well have been 115.
Now CBS excels at doing descriptive statistics, which is also their legal assignment. They should neutrally disclose and represent the facts that politicians, administration and citizens need. Where CBS has less in-house expertise, because it is certainly not part of their task, is in disentangling cause and effect. This is what we call “Causality” today and it is an extremely topical, important, subtle, and complex subject of scientific inquiry; exploded since Judea Pearl’s 2000 book “Causality”. Can you infer causality by observing correlation or association?
Example. Lucia de B experienced an awful lot of incidents in her services. Much more than one would have expected and that also led to life imprisonment for serial murder. Only later did it become clear that her presence was precisely the reason why medical examiners characterized certain events as incidents!
But can *no* association also indicate causality? Yes! Statistics can be misleading. An appealing visual representation of statistics all the more. My eye was drawn to Figure 6.1.2 in the CBS report in which we are three brightly colored bars, which should represent the percentages 1%, 4% and 4%. See! The percentage of custodial placements among the victims is exactly what you would have expected, if all those families had not been victimized at all!
I’d say that can’t be a coincidence. After studying the research protocol, including the many algorithms used by the team, it also becomes clear that this is no coincidence. Due to the research choices that the research team felt compelled to make, the difference in out-of-home placements between “comparable” victims and non-victims has been systematically reduced. So the difference is greater than it appears (it appears to be zero, but it is definitely not). The correct conclusion of the investigation should have been, first, that there were certainly dozens of “extra” custodial placements because of the affair and possibly a hundred (or even a few hundred). A second conclusion should have been that this bold pilot study has proven that a completely different research design is needed to answer an old question. Possibly, something along the lines of Prof. dr. Bart Tromp of the University of Groningen. Incidentally, it is never necessary to go through *all* files of the entire history of all victims. By smartly taking a random sample in a sensibly chosen sub-population, one can limit oneself to properly sorting out relatively few cases.
Good “Data Science” is impossible without combining great expertise from three areas at the same time: 1) algorithms and computing capabilities; 2) probability theory and inferiential statistics (ie quantifying the uncertainty in the results found); 3) (last but not least!) subject-specific knowledge of the intended application area; in this case psychology, law, administration.
I am currently writing out the justification for my claims in my blog, https://gill1109.com/2023/01/18/de-statistiek-van-slachtoffers-van-toeslagsschandaal/; it still needs to be expanded a lot with further substantiation, references, and so on.
I’m thinking of a statistical simulation to illustrate my point. Those two numbers “4%” need error bars of about +/- 1%. Tricky because I must take account of the correlation within the pairs. We can only guess how big it is. So: several simulations with different guesses.