Bell’s theorem is an exercise in the statistical theory of causality

Abstract. In this short note, I derive the Bell-CHSH inequalities as an elementary result in the present-day theory of statistical causality based on graphical models or Bayes’ nets, defined in terms of DAGs (Directed Acyclic Graphs) representing direct statistical causal influences between a number of observed and unobserved random variables. I show how spatio-temporal constraints in loophole-free Bell experiments, and natural classical statistical causality considerations, lead to Bell’s notion of local hidden variables, and thence to the CHSH inequalities. The word “local” applies to the way that the chosen settings influence the observed outcomes. The case of contextual setting-dependent hidden variables (thought of as being located in the measurement devices and dependent on the measurement settings) is automatically covered, despite recent claims that Bell’s conclusions can be circumvented in this way.

Richard D. Gill

Mathematical Institute, Leiden University
Version 2: 20 March, 2023. Several typos were corrected. Preprint:

In this short note, I will derive the Bell-CHSH inequalities as an exercise in the modern theory of causality based on Bayes’ nets: causal graphs described by DAGs (directed acyclic graphs). The note is written in response to a series of papers by M. Kupczynski (see the “References” at the end of this post) in which that author claims that Bell-CHSH inequalities cannot be derived (the author in fact curiously writes may not be derived) when one allows contextual setting-dependent hidden variables thought of as being located in the measurement devices and with probability distributions dependent on the local setting. The result has of course been known for a long time, but it seems worth writing out in full for the benefit of “the probabilistic opposition” as a vociferous group of critics of Bell’s theorem like to call themselves.

Figure 1 gives us the physical background and motivation for the causal model described in the DAG of Figure 2. How that is arranged (and it can be arranged in different ways) depends on Alice and Bob’s assistant, Charlie, at the intermediate location in Figure 1. There is no need to discuss his or her role in this short note. Very different arrangements can lead to quite different kinds of experiments, from the point of view of their realization in terms of quantum mechanics.

Figure 1. Spatio-temporal disposition of one trial of a Bell experiment. (Figure 7 from J.S. Bell (1981), “Bertlmann’s socks and the nature of reality”)

Figure 1 is meant to describe the spatio-temporal layout of one trial in a long run of such trials of a fairly standard loophole-free Bell experiment. At two distant locations, Alice and Bob each insert a setting into an apparatus, and a short moment later, they get to observe an outcome. Settings and outcomes are all binary. One may imagine two large machines, each with a switch on it that can be set to position “up” or “down”; one may imagine that it starts in some neutral position. A short moment after Alice and Bob set their switches, a light starts flashing on each apparatus: it could be red or green. Alice and Bob each write down their setting (up or down) and their outcome (red or green). This is repeated many times. The whole thing is synchronized (with the help of Charlie at the central location). The trials are numbered, say from 1 to N, and occupy short time-slots of fixed length. The arrangement is such that Alice’s outcome has been written down before a signal carrying Bob’s setting could possibly reach Alice’s apparatus, and vice versa.

As explained, each trial has two binary inputs or settings, and two binary outputs or outcomes. I will denote them using the language of classical probability theory by random variables A, B, X, Y where A, B take values in the set {1, 2} and X, Y in {–1, +1}. A complete experiment corresponds to a stack of N copies of this graphical model, ordered by time. We will not make any assumptions whatsoever (for the time being) about independence or identical distributions. The experiment does generate an N × 4 spreadsheet of 4-tuples (a, b, x, y). The settings A, B should be thought of merely as labels (categorical variables); the outcomes X, Y will be thought of as numerical. In fact, we will derive inequalities for the four “correlations” E(XY | A = a, B = b) for one trial.

Figure 2. Graphical model of one trial of a Bell experiment

In Figure 2, the nodes labelled A, B, X, and Y correspond to the four observed binary variables. The other two nodes annotated Experimenter and (Hidden) correspond to factors leading to the statistical dependence structure of the four-tuple (A, B, X, Y) of two kinds. On the one hand, the experimenter externally has control over the choice of the settings. In some experiments, they are intended to be the results of external, fair coin tosses. Thus, the experimenter might try to achieve that A and B are statistically independent and completely random. The important thing is the aim to have the mechanism leading to the selection of the two settings statistically independent of the physics of what is going on inside the long horizontal box of Figure 1. That mechanism is unknown and unspecified. In the physics literature, one uses the phrase “hidden variables”, and they should be thought of as those aspects of the initial state of all the stuff inside the long box which leads in a quasi-deterministic fashion to the actually observed measurement outcomes. The model, therefore, represents a classical physical model, classical in the sense of pre-quantum theory, and one in which experimental settings can be chosen in a statistically independent manner from the parameters of the physical processes, essentially deterministic, which lead to the actually observed measurement outcomes at the two ends of the long box.

Thus, we are making the following assumptions. There are two statistically independent random variables (not necessarily real-valued – they may take values in any measure spaces whatsoever), which I will denote by ΛE and ΛH, such that the probability distribution of (A, B, X, Y) can be simulated as follows. First of all, draw outcomes λE and λH, independently, from any two probability distributions over any measure spaces whatsoever. Next, given λE, draw outcomes a, b from any two probability distributions on {1, 2}, depending on λE. Next, given a and λH, draw x from the set {–1, +1} according to some probability distribution depending only on those two parameters, and similarly, independently, draw y from the set {–1, +1} according to some probability distribution depending on b and λH only.

[Footnote: In this Kolmogorovian mathematical framework, there is a “hidden” technical assumption of measurability. It can be avoided, see the author’s 2014 paper “Statistics, Causality and Bell’s Theorem”, published in the journal Statistical Science and also available on The assumption of N independent and identically distributed copies of this picture can be avoided too.]

Thus, ΛH is the hidden variable responsible for possible statistical dependence between X and Y, given A and B.

In the theory of graphical models, one knows that such models can be thought of as deterministic models, where the random variable connected to any node in the DAG is a deterministic function of the variables associated with nodes with direct links to that node, together with some independent random variable associated with that node. In particular, therefore, in obvious notation,
X = f(A, ΛH, ΛX),
Y = g(B, ΛH, ΛY),
where ΛH := (ΛH, ΛX, ΛY),) is statistically independent of (A, B), the three components of Λ are mutually independent of one another, and f and g are some functions. We can now redefine the functions f and g and rewrite the last two displayed equations as
X = f(A, Λ),
Y = g(B, Λ),
where f, g are some functions and (A, B) is statistically independent of Λ. This is what Bell called a local hidden variables model. It is absolutely clear that Kupczynski’s notion of a probabilistic contextual local causal model is of this form. It is a special case of the non-local contextual model
X = f(A, B, Λ),
Y = g(A, B, Λ),
in which Alice’s outcome can also depend directly on Bob’s setting or vice versa.

Kupczynski claims that Bell inequalities cannot (or may not?) be derived from his model. But that is easy. Thanks to the assumption that (A, B) is statistically independent of Λ, one can define four random variables X1, X2, Y1, Y2 as
Xa = f(a, Λ)
Yb= g(b, Λ).
These four have a joint probability distribution by construction, and take values in {-1, +1}. By the usual simple algebra, all Bell-CHSH inequalities hold for the four correlations E(XaYb). But each of these four correlations is identically equal to the “experimentally accessible” correlation E(XY | A=a, B = b); i.e., for all a, b,
E(Xa Yb) = E(XY | A=a, B = b),
–2 ≤ E(X1Y1) – E(X1Y2) – E(X2Y1) – E(X2Y2) ≤ +2
and similarly for the comparison of each of the other three correlations with the sum of the others.

The whole argument also applies (with a little more work) to the case when the outcomes lie in the set {–1, 0, +1}, or even in the interval [–1, 1]. An easy way to see this is to interpret values in [–1, 1] taken by X and Y not as the actual measurement outcomes, but as their expectation values given relevant settings and hidden variables. One simply needs to add to the already hypothesized hidden variables further independent uniform [0, 1] random variables to realize a random variable with a given conditional expectation in [–1, 1] as a function of the auxiliary uniform variable. The function depends on the values of the conditioning variables. Everything stays exactly as local and contextual as it already was.


M. Kupczynski (2017a) Can we close the Bohr–Einstein quantum debate? Phil. Trans. R. Soc. A 375 20160392.

M. Kupczynski (2017b) Is Einsteinian no-signalling violated in Bell tests? Open Phys. 2017 5 739–753.

M. Kupczynski (2018) Quantum mechanics and modelling of physical reality. Physica Scripta 93 123001.,

M. Kupczynski (2020) Is the moon there if nobody looks: Bell inequalities and physical reality. Frontiers in Physics 8 (13 pp.)

One thought on “Bell’s theorem is an exercise in the statistical theory of causality”

Leave a Reply

%d bloggers like this: