A Causal Diagram on COVID-19 Infection

“I want to know why my friend, 69, was home with 104 fever, reported to his dr., test negative for flu, pneumonia, finally on day 5 was tested for Covid-19, 3 days later positive, told to stay home, and now is near death on ventilator in hospital? Why so long to test????”

(One of the many tweets that express that we are desperately looking for causes. From: Twitter 25-03-2020)

The SARS-CoV-2 virus is rapidly spreading over the world. Every hour the media come with news about the Corona pandemic; the numbers of deaths, of people tested positive, of people admitted to Intensive Care units. Every day we can follow debates between politicians, experts and the public how to handle the various problems caused by the virus.  In the Netherlands we have to stay at home as much as possible, to wash hands regularly, to not shake hands and to keep a “social distance” towards one another of at least 1.5 meters. Schools, restaurants and many shops are closed. Elder people in community houses or living alone are among the most vulnerable. More and more countries decide to a complete lock-down.

Scientists, in particular virologists, epidemiologist, physicians explain their audiences the mechanisms behind the spreading of the virus to make clear the reasons for the measures taken by their governments (or to comment on them). Statisticians try to make sense out of the wealth of data collected by national and international health institutions. What questions can they answer based on their statistics?

People ask what the chances are they will be exposed to the virus when they go to the supermarket. Others want to know how long it takes before the pandemic is under control so that they can go to work and the children to their schools. But a lot is still unknown. The virus differs from those of known influenza epidemics.

A model on the individual level

National health institutions (in the Netherlands the RIVM) as well as international health organisations (in particular the WHO) gather and publish data about the numbers of infected people, as well as how many people died because of the virus. Researchers use this data to explore mathematical growth models to see if they can fit them to the data so they can be used to predict the virus spread, the peak, and when it will decline and how this depends on the policy. These models look on the level of the population of a whole nation, or a particular region (e.g. Lombardi) or even at a city (Wuhan). Sometimes they do look at the different age groups or gender differences. For example to predict fatality rates for different groups.

But these model do not look at the individual level. The causal model that I propose here differs from the epidemiological models. It models the factors that play a role in the effects of the virus on the individual level.

Monitoring Reproduction Number R0

An important societal quality to measure in case of a epidemic is the basic reproduction number of the virus R0 (R-naugth). Politicians and public health experts keep a close eye on this number. R0 represents the number of new infections estimated to stem from a single case. If, for example, R0 is 2.5, then one person with the disease is expected to infect, on average, 2.5 others. An R0 below 1 suggests that the number of cases is shrinking. An R0 above 1 indicates that the number of cases is growing. Of course the numbers counted are statistical measures over a population.

R0 depends on a number of factors. The number of people infected, the number of people vulnerable, the chances that a contact between people will make that the virus transfers from one to another person, how long people infected are able to affect other people. Some of these values are typical for a virus. Others can only be estimated from large populations. A definition of R0 refers to a scientific model of the complex mechanisms underlying the spreading of the virus. See this paper for the confusion about R0. All in all, it is hard to know the “real” value of R0. We can only estimate it.

Since spreading of the virus depends on the way people behave, policies try to steer peoples behavior. So they hope to control R0. In a similar way the doctor tries to influence a patient’s situation by application of some medical treatment. Since the doctor doesn’t know how the patient will react he will keep a close eye on the patient’s situation and adjust his treatment if required. A complicating factor is the time delay. A treatment will only have effect after some time. So in order to prevent critical situations we need to predict how the system that we try to control will behave in the future. So that we can adjust treatment in time. If you want to shoot a flying duck you need to estimate where it will be at the time the bullet will reach the point. All in all there are so many uncertainties, not in the least about how the public will respond in the long run to the measures taken by governments (e.g., to stay at home), and how this depends on expectations presented in the media. Maybe other values become more important after some time (e.g visiting family, go out for pleasure) so that people change their behavior.

Towards a causal Corona model

Many people get sick from the virus, some of them have only mild symptoms, a small percentage does not survive the attack.  Most of the ones that die are older and already have health problems. But there are exceptions. It seems that not all people get the disease. This raises our first question.

  • What are the factors that determine if a person will get COVID-19 ?

For a person to get a disease caused by a virus infection two things are necessary and sufficient: (a) the person is vulnerable for the virus and (b) the person is actually infected by the virus after he or she was exposed to the virus.

This is almost trivial logic of the classical potency act doctrine. Likewise: for a glass to break, the glass has to be breakable (vulnerable for breaking) and there must be some actor that actually breaks it.    

 Our first questions raises two follow-up questions:

(2) What are the factors that determine a person’s vulnerability for being infected?

(3) What are the factors that determine that the person’s gets exposed and infected with SARS-CoV-2?

Some people die after they got the disease, others survive.

(4) Which factors determine how serious the disease will become and what are the factors that determine the chances of survival?

If we have an answer to these questions we can predict which people run most risk of getting ill and what should be the best policy to prevent or control outbreak of the virus. To find answers we have to collect data. The more data, the better. Suppose we want to know what the effect is of age on chances to die because of corona, among those that are affected by the virus. We need statistics: for every patient we need age group, and effect, where effect is either survived or died. We can now estimate the probability for each age group. We might expect that the older the people are the smaller the chances to survive. Suppose we see an unexpected dip in the curve after the age group above 70. How can we explain this? Our data doesn’t tell. Was there a policy to not treat patients of this age group? Data alone is not enough to answer these questions and to make good predictions. We need to know the mechanisms behind it.

A plea for causal graphs

In The Book of Why (2018) computer scientist and philosopher Judea Pearl argues for the use of causal diagrams as a computational instrument to make causal inferences from data. Pearl, who invented Bayesian Networks, points at the fact that we can’t draw causal conclusions from data alone. We need causal intuition from experts about the case at hand. With such a causal model we can simulate experiments by setting values of variables of the causal model and see what the effects are on the outcome variables that we are interested in. Also, if we want to answer questions like “What if Mrs. S., aged 71, of which we know that she became seriously ill after being exposed to the virus and that she was not taken in hospital, had been taken in hospital care? Would she have died?’’

What is a causal diagram?

A causal diagram is a graphical structure with nodes and arrows between pairs of nodes. The arrows represent direct causal relations. The source of the arrow is the cause, the target node represents the effect of the causal relation. Figure 1 shows a causal diagram that could be a part of a causal graph for the analysis and prediction of the effect of age, medical condition and treatment on chances to survive.

Figure 1. A causal diagram

The diagram shows that Treatment depends on Age as well as on Condition. Treatment has effect on Lethality (chance to survive), but this is also dependent on Age and on Condition. Age influences Lethality through four different paths.

Causal diagrams can answer questions like “How does the lethality in the population change if we decide to give all infected people the same treatment independent of age? Or   “How does the recovery rate for the corona disease change if we enforce a complete lock-down instead of the actual soft “intelligent’’ lock down?”.

Such questions can only be answered correctly by means of a causal intervention in the diagram. In technical terms: by applying the do-operator on the intervention variable. (We give the variable a fixed value and compute the effect). This simulates more or less what we would do in a randomized controlled experiment. We learn how Nature works by bringing about some controlled change and observe how Nature responds to that. The validity of the conclusions that we draw from such simulations requires that the model is correct, no arrow between two nodes if there is no causal relation, and complete in the sense that all relevant factors and relations are in the model.

How do we know if the model is complete and correct? I am not aware of any convincing publication about validation of causal diagrams. In fact, a causal diagram is a theory that can at best be falsified, not proven to hold the truth. Pearl’s Book of Why describes some interesting episodes that show that the truth consists in the historical process of scientific research. Not outside this process. Although it helps that now and then someone stubbornly believes that she has seen the light and fights against the community’s doctrine. Those are the ones that make science progress. 

Causal Diagrams and Bayesian Networks

Nodes X and Y of a causal diagram represent factors that play a role in the model, with an arrow drawn from X towards Y if we conceive X as a “direct cause’’ of Y. Mathematically the factors X and Y are stochastic variables of various types: continuous, or discrete: ordered or categorical.

A causal network is a special type of Bayesian network. In a Bayesian network the arrows do not need to refer to direct causes. They stand for probabilistic “influence” without implying a causal direction.  For example, if we know that someone lives in an elderly home we can infer that he or she is most likely over 70 years of age. There might not be a direct causal relation in a strict sense between age and living in an elderly home, but if we got the information that mrs S. lives in an elderly home chances are raising she is over 70 years of age and not 15. In a Bayesian network to each node Y is attached a (conditional) probability table for P(Y|z) where z is a set of all variables that point at Y (the parent nodes of Y).  Thus when the network is X -> Y then node Y has a conditional probability table P(Y|X) and X has a table for the probability distribution P(X).

Where do the probabilities come from? Bayesian networks are quite tolerant about the source of the probability values: they are either computed from data (probabilities as relative frequencies) or based on expert opinions (probabilities as confidence measures of “belief states’’). For a comprehensive review about the use of Bayesian Networks in health applications refer to Evangelia Kyrimi et al. (forthcoming). One observation is that many researchers do not publish their network, nor do they comment on the design process.

Bayesian networks are used to compute (infer) the probabilities of some outcome variables given some known values of input variables (the observed ones). The computation is based on the classical Theory of Probability (Pascal, Fermat) and uses Bayes’ Rule for computing the unknown probabilities of variables.

Let me give an example. Let T be the variable that stands for the outcome of a corona test (its value can be positive or negative) and let D stand for the statement “patient tested has corona disease’’ (D=true or false). We know that there is some causal relation D -> T. In diagnostics we reason from test to disease, not in the causal direction. Suppose the test outcome is positive. Does the patient has the disease? Tests are not 100 % reliable and sensitive. We want to compute P(D=pos|T=pos), the chances that the patient has the disease, given that the test outcome is positive.

Bayes’ rule says:

P ( D=pos | T=pos ) = [ P ( D=pos ) / P ( T=pos ) ] * P ( T=pos | D=pos )

The second factor P ( T=pos| D=pos ), the probability that the test is positive, if we know that the patient has the disease, on the right hand side of the equation has known values. They are based on expert knowledge about the test mechanism explaining how the disease causes the test outcome. The first factor is a fraction: the numerator P (D=pos) is the prior probability, i.e., the chance that the patient has the disease before knowing the test outcome. The denominator P(T=pos) is the probability that the test is positive whatever the situation of the patient. There are small chances that the test shows a positive outcome in case the patient does not have the disease (a false positive).

Bayes rule is consistent with intuitive logic. As you can see from the formula on the right hand side of Bayes rule, the chances P ( D=pos | T=pos ) grow when the prior P (D=pos) grows. When almost everyone has the disease then chances that you have it when you have been tested positive are also high. On the other hand, the higher P( T = Pos ), i.e. the higher chances are that the test is positive (also for people that do not have the disease), the smaller the probability that a positive test outcome witnesses the occurrence of the disease.

When P( X | Y ) is not equal to P (X),  we call X and Y probabilistically dependent. This dependency should not be confused with causal dependency. P(D=pos|T=pos) differs from P(D=pos) but having a disease is not caused by a test.  Also correlation (a measure for co-occurrency) should not be confused with causation.    

What counts as a causal relation?

As we said before, what makes a causal diagram a special (Bayesian) network is that the arrows in the network represent direct causal relations. But what is a causal relation? Where do they come from?

Almost every day we express our beliefs in the existence of a causal relation between phenomena. Either in an implicit way, e.g., “If you don’t hurry now, you will get late.’’, or, explicitly, e.g., on a package of cigarettes: “Smoking causes lung cancer.’’, or “Mrs. Smith died because of corona.’’.  But if we come to think about it, it is not easy to tell what a causal relation is, and how we can distinguish it from other types of influencing between things that happen. The modern history of the idea of cause started with Aristotle’s theory of the four causes of being (see his Metaphysics). Only two of them have survived: the notion of efficient cause and the notion of final cause. In our modern mechanical scientific world view we see a cause as something that brings about some effect: a billiards ball that causes another ball to move and so on. We see chains of causes and effects and many causes that influence other things or processes. On a medical internet site I found “Smoking does not cause lung cancer’’. The motivation is:  (1) not everybody who smokes gets the disease, (smoking is not a sufficient condition to bring about the effect) and (2) some people get lung cancer and never smoked (it is not a necessary condition to bring about the effect). Indeed, after many years of debates and research (see Pearl’s account of this episode in the history of science in The Book of Why), the causal mechanisms of cancer development under the influence of smoking have been largely unraveled. We know that the primary effect of smoking is that the tar paralyzes and can eventually kill the tiny hair-like structures in the lungs. Lung cancer happens when cells in the lung mutate and grow uncontrollably, forming a tumor. Lung cells change when they are exposed to dangerous chemicals that we breathe. The tar makes the lungs more vulnerable for these chemicals. In that sense smoking is indeed a cause of lung cancer. The search for the “real’’ cause will only end when we have found a plausible theory that describes the chemical mechanism of the change of the lung cells.

In general, it seems that as long as cause and effect are two separate objects or processes our wish to know how nature works is not really satisfied. A successful and satisfying search for the real cause of a given effect eventually reveals that cause and effect are actually two sides of the same mechanism that we tend to objectify. In reality cause and effect are materially one and the same process. But this only holds at the end of the search for causes. In making a causal diagram we must take for a cause everything that has a direct influence on (the status of) the effect.  

To make a long story short: there is no agreement about what counts as a “cause’’, so we have to rely on our “causal intuition’’ and expertise in the relevant domains.    

Need for expert knowledge

I am not an expert in one of the relevant areas, but I take the challenge to make a causal diagram based on the information that I gathered from the literature and the internet. I have some experience in the design of such models, but in quite different areas. In the past I gave courses in Artificial Intelligence devoted to Reasoning under Uncertainty. We used (Dynamic) Bayesian Networks for Natural Language Processing and for recognition of participant’s behavior in conversations. In making it I have learned how challenging it is to make such a diagram. The Dutch newspapers and the Dutch RIVM provide statistics and background articles that I use to make the diagram.

My causal corona diagram is just the basic structure of a Bayesian network: I do not attach (conditional) probability tables to the nodes. Computing probabilities is not my concern. I am interested in the causal structure. The main function is to organize and visualize the connections between the most important factors that play a role in the process of getting caught by the corona virus.

The Corona Diagram

The core of the diagram is based on the simple idea that a subject will get sick if and only if two things hold: 1) the subject is vulnerable and therefore sensitive for being infected and 2) he or she is exposed and actually infected by the virus. This is as trivial as cheese and almost tautological. But therefore it is not less truthful.

Here is the kernel of the causal diagram in which Seriousness is the label for the variable that indicates how serious the disease of this person is.   

Now that we are in the middle of the pandemic the public policies focus on minimizing the chances of exposure and infection. Several factors influence a person’s vulnerability for viruses and infections in general. We currently do not have a complete picture of which factors are typical of influence for sensitivity for this new SARS virus.  

Figure 2. The core of the Corona diagram

SeriousNess:  indicates how seriously ill the condition of the patient gets. Values varies from Asymptomatic, Mild, Serious to VerySerious.

It is directly influenced by two factors:

Vulnerability : indicates how sensitive the patient is for the virus. It refers to the subject’s physical condition. Other relevant conditions belong to the second cluster of nodes.

Infection : Infection is when the virus has entered the body. It ranges from nill to a few to many.

When there is no Infection SeriousNess is nill. SeriousNess is determined by the degree of Infection and by Vulnerability. The factors that determine Vulnerability should answer the questions why this young man became seriously ill and needs hospital care where this woman shows only mild symptoms. Recent research indicates that Infection is a gradual thing (a person can be disposed to a few or many viruses) and that SeriousNess depends in a more complex way on the Infection grade (Or type? It might be that difference variants of the Corona virus make a difference here.)

Lethality indicates the chances to die. It depends on the Seriousness and the MedicalCare, i.e. the medical treatment, offered. There is no cure at the moment; only care can help the patient to survive.

Note: It is often difficult to say what the exact cause of death is. Statistics about populations may reveal how many deaths are the effect of the corona virus.

Infection is caused by Exposure (is the subject exposed to virus) and influenced by SelfHealthCare (did subject take hygienic precautions; wash hands, etc.).  SelfHealthCare is important to prevent Infection. That is why it is urged to wash hands with soap or alcohol regularly and not to touch the face.

Vulnerability and Infection are the two main factors. Both are necessary for becoming ill. As you can see there is no direct relation between the two, i.e. Vulnerability does not cause Infection nor the other way around.  However from data collected it will come out that there is a high correlation between the two, suggesting a causal relation. This will show for example when we collect data from medical staff taken into hospital care after they have been tested positive. We cannot conclude from this that in the general public there is a causal relation between the two. This “spurious’’ causal relation only holds for people that are actually infected. (See for an explanation of this phenomenon Pearl’s discussion of Berkson’s paradox in The Book of Why.)

Figure 3. The Corona causal diagram

What are the factors that influence these two main factors?


This is influenced by:

                HealthCondition:  the physical condition. Some people say that the people that die after being Infected would die anyway within a year or so because of their bad health condition.

                ChronicCondition: chronic diseases: lung diseases, coronary, heart, diabetes. There are indications that a high percentage of patient taken in hospital have a too high BMI (body mass index).

                Gender: 2/3 of the patients that passed away after getting serious ill are male. 

GeneticFactors:  other genetic factors besides gender specific ones may play a role. The COVID-19 Host Genetics Initiative (2021) reports that “While established host factors correlate with disease severity (e.g., increasing age, being a man, and higher body mass index) these risk factors alone do not explain all variability in disease severity observed across individuals. The genetic makeup of an individual contributes to susceptibility and response to viral infection.

“This international initiative describes the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. They report 13 genome-wide signifcant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19.

                ImmunizationStatus:  it is currently not known how the immune system responds to the virus. Several vaccins are developed and have been applied since the end of 2020. It is to be expected that people that recovered have lesser chances to become ill again. Current research seems to indicate that people that had only mild symptoms are not completely immune for the virus.

                Age: is directly related to a number of other factors. Age is also a factor in the Exposure cluster. Age is more a property that can be used as an indicator for certain inferences. Ageing as a proces influences many aspects of peoples’ life: work, social contacts, etc.  (See below for a discussion about ageing as multi-factor confounder).


A subject can be exposed to the virus to several degrees dependent on the concentration of the virus in the locality where exposure occured and the duration of the contact. Exposure is determined by several factors:  mainly by social contacts and activities (type of work, hobbies).  Another factor is

VirusSpread: how much is the virus spread over the subject’s social environment?

If VirusSpread is high and SocialContacts is also high then chances for Exposure are high as well.[1]

It is difficult to get a good picture of the spread of the virus. Only a limited number is tested for the virus. Conclusions drawn from the data collected are most likely biased. Also, different countries have different policies, which makes merging data a real challenge. Even within the Netherlands different regions have different policies dependent on the test capacity available.  See below for a recent publication about a simulation study of the spreading mechanisms of the virus.

Personality: influences SocialContact (See also the comment about compliancy in the notes below.)

Household: apart from social contacts the direct contact with family members influence the chance to be exposed to the virus.

Activities: it makes a difference if the person lives on his/her windowsill or is an active participant in all kinds of social events and organizations.

How can a Corona diagram be used?

The causal diagram presented is the result of an exercise in making a causal diagram for a realistic case. There is need for expert knowledge to come up with a better model. It can be completed with probabilities based on available data.  

Eventually, a causal diagram like this for the Corona pandemic can be used to answer several questions, not just probabilistic questions: what are the effects of certain observations on the occurrence of other events?  But also causal questions: what is the effect of changing values on outcome variables? It can also be used to answer counterfactual questions on an “individual’’ level[2].

The diagram can also be used for qualitative analyses. How is Age related to Lethality? The diagram shows that the influence of Age goes through Sensitivity. Ageing makes people more sensitive for diseases. So if they are infected they run a risk for getting in a serious condition. If there is not enough medical care for the subject chances raise that he or she will die.

What is the impact of gender on lethality?  Data about the incidence of COVID-19 indicates that there is a high correlation between Gender and Seriousness. This suggest a direct effect of Gender on Vulnerability, i.e., that male subjects are more sensitive than female subjects. Statistics show that 2/3 part of all patients are male. But other genes of the human genome may play a role as well.

Health care workers, doctors and nurses in hospitals are more vulnerable than, e.g., computer programmers. The former run a higher risk of becoming exposed and infected than the latter group.  This is reflected in the Activities node in the diagram.

What is the impact of the ecological environment on chances to get ill? The influence of the environment is indirect via HealthCondition. Polluted air causes chronic lung diseases such as COPD which makes people more vulnerable. The risks of getting ill are higher in large urban areas (e.g., Wuhan, Madrid, North Italy, New York) than in traditional agricultural areas. People living in urban areas run more risk to get exposed because of the heavy use of public traffic.

Finally, such a diagram might also help in deciding for or to explain policies that aim at minimizing exposure for specific subgroups of citizens. Isolation of elderly people helps because it affects Exposure, a necessary condition for getting infected.

A note on confounding factors: age

Our diagram has a simple tree-like structure. It has two main branches that are not connected upwards: Seriousness and Infection are not directly connected. Each of the two branches have a tree-like structure themselves.

This implies that Seriousness and Infection are not related. This would be the case if the two branches had a confounding cause. A confounder C of X and Y is simply said a common cause of X and Y. Pearl gives in The Book of Why (2018) a better, mathematical, definition in terms of his do-operator. Since Age influences in fact not only Vulnerability but also Exposure via SocialContacts (not indicated in the diagram) Age is a confounder.

Pearl(2018) discusses a study among retired men that revealed an association between regular walking and reduced dead rates. Is there a causal relation between the two? According to Pearl, the experimenters did not prescribe who would be a casual walker and who would be an intense walker. So we have to take into account the possibility of confounders. “An obvious confounder might be age”. The causal diagram is shown below.

Figure 4. A causal diagram from The Book of Why (J. Pearl, 2018)

Confounding factors cause spurious correlations between other variables. If we want to know the effect of Walking on Mortality we should do that for fixed age groups. The situation would be different when Walking had a causal influence on Age, which is not the case (well, maybe it is).

Some other studies: testing for the virus and mortality

A recent analyses using collected datasets of over 17 million adult NHS patients of which a bit less than 5700 died attributed to COVID-19, confirms the relevance of most the factors in our model that determine chances on death caused by the Corona virus. The study reports (version 7th May 2020) numerical values for the impact of these factors on lethality: age, sex, medical conditions (chronic diseases), deprivation. Race is a factor: “Compared to people with ethnicity recorded as white, black people were at higher risk of death“. Behavioral factors, as the ones in our model, have not been studied. A nice causal diagram is missing in the report.

Some epidemiologists are active on twitter and discuss causal models. One of them is Ellie Murray. She likes to illustrate her model and ideas with nice handmade infographics. See figure below.

A nice illustration of a causal analysis by Ellie Murray on Twitter (May 8, 2020)

Besides the personal question “How are chances for this particular person that he or she will be infected by the virus and survive?’’, there are questions concerning the effect of policies as well as compliancy on the spread of the virus and on mortality. Other researchers focus on these and on the global effect of the pandemic on mortality in the general public. How much does the virus shorten life expectancy?

David Spiegelhalter (2020) discusses the issue of whether many deaths from COVID-19 would have occurred anyway as part of the `normal’ risks faced by people, particularly the elderly and those with chronic health problems who are the main victims of COVID. The current – 25-03-2020 – estimated infection fatality rate (percentage death among infected subjects) is anywhere between 0.5% and 1%.

Norman Fenton et al.(2020) illuminate the need for random testing to prevent bias due to the selection methods used now. They advocate the use of a causal model for the analysis of the results. They present an example of a causal model for a given country and its population in order to show that the COVID-19 death rate is “as much a function of sampling methods, testing and reporting, as it is determined by the underlying rate of infection in a vulnerable population.’’

Wilder et al.(2020) develop an agent based model for studying the spread of the SARS-CoV2 virus. “A key feature of the model is the inclusion of population-specific demographic structure, such as the distributions of age, household structure, contact across age groups, and comorbidities.’’ The aim of this study is “to evaluate the impact of age distribution and familial household contacts on transmission using existing data from Hubei, China, and Lombardy, Italy –two regions that have been characterized as epicenters for SARS-CoV2 infection –and describe how the implications of these findings may affect the utility of potential non-pharmaceutical interventions at a country-level.’’


Motivated by Judea Pearl’s The Book of Why in which he advocates the use of causal diagrams, interested in the mechanisms that play in the Corona disease, not disturbed by any expert knowledge in virology, epidemiology, or whatever relevant domain, I made a causal diagram. I hope I have explained what it is, what it can be used for and why it would be good to work on a better one.  

Rieks op den Akker, Lonneker, March 2020


COVID-19 Host Genetics Initiative (2021). Mapping the human genetic architecture of COVID-19. Nature (2021). 

David Spiegelhalter (2020), How much ‘normal’ risk does Covid represent? Blog posted March 21, 2020. https://medium.com/wintoncentre/how-much-normal-risk-does-covid-represent-4539118e1196

Judea Pearl & Dana Mackenzie (2018). The Book of Why : the new science of cause and effect. New York: Basic Books.

Judea Pearl (2001), Causality: models, reasoning and inference. Cambridge University Press, Revised edition, 2001.

Norman Fenton, Magda Osman, Martin Neil, Scott McLachlan. Improving the statistics and analysis of coronavirus by avoiding bias in testing and incorporating causal explanations for the data. http://www.eecs.qmul.ac.uk/~norman/papers/Coronavirus_death_rates_causal_model.pdf

Wilder, Bryan and Charpignon, Marie and Killian, Jackson and Ou, Han-Ching and Mate, Aditya and Jabbari, Shahin and Perrault, Andrew and Desai, Angel and Tambe, Milind and Majumder, Maimuna, The Role of Age Distribution and Family Structure on COVID-19 Dynamics: A Preliminary Modeling Assessment for Hubei and Lombardy (March 31, 2020). Available at SSRN: https://ssrn.com/abstract=3564800 or http://dx.doi.org/10.2139/ssrn.3564800

Kyrimi, E., McLachlan, S., Dube, K., Neves, M.R., Fahmi, A., & Fenton, N.E. (2020). A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future. ArXiv, abs/2002.08627.

Rieks did some mathematics and computer science. He has a PhD in computer science. He moved to dialogue systems research and natural language processing. He was Assistant Professor Artificial Intelligence and Human Computer Interaction at the University of Twente. He designed and used Bayesian Networks for modeling and prediction of conversational behaviors. He lectured logic and statistical inferencing in AI courses focusing on reasoning with uncertainty.


[2] Real individuals do not occur in scientific models. Science is about categories. Individuals are abstract entities represented by a list of attribute/value pairs. In a counterfactual question to the model we change certain values of some individual to see what the effect would be had she had this property instead of the one that she has in the real world.

[1] Comment by Miriam Cabrita. Related to Exposure, I miss how the community (excluding the individual self) comply with the measures proposed to stop the virus. For example, one way or another everybody needs to do shopping. Let’s assume that an individual goes to the supermarket and takes all precautions given by the government and even more (e.g. washes hands carefully when arrives home, removes shoes). If the people in the community are sloppy (not to say  stupid), and keep going to the supermarket while sick, the chances that the individual is Exposed to the disease are much higher than in a community where everyone respects the rules. What if the people you live with do not comply with the rules of SelfHealthCare? Then you are also much more exposed to the virus.

Response by Rieks: Compliance is indeed a factor that affects VirusSpread. It maybe affected by Personality, i.e., some people are more compliant to authority regulations than others (youngsters, elderly, for different reasons). You need to know compliance to treatment to know the effects of treatment. You need to know if people were free to choose for or assigned to a treatment, since that affects compliancy. These are relevant issues in the current discussion in the Netherlands about the introduction of a corona app to trace, detect and warn people for corona. What can we do with the data collected? 

Published by


Rieks op den Akker was onderzoeker en docent kunstmatige intelligentie, wiskunde en informatica aan de Universiteit Twente. Hij is gepensioneerd.

One thought on “A Causal Diagram on COVID-19 Infection”

Leave a Reply