The main idea of the EU Horizon 2020 projectCouncil of Coachesis to have a number of coaches that you can gather and meet. I think it is a clever idea (you never know where playing video games are good for…). Why?
Here I present the challenges I see in this project regarding reseach in the use of virtual conversational characters for serious applications (other than demonstrators, gaming or art) . There is an extensive project website containing a lot of information.
Goal of the project is to help elderly people like me (for practical purposes: 55+ and you are old nowadays; and young as well) to reflect on their health issues. Specific target are the chronic diseases: lung-diseases (astma; copd) , heart-diseases, chronic back pain; obesitas, diabetes. If you happen to suffer from one of these the others will most of the time come soon. How to cope with this situation?
To be honest – and why shouldn’t I ? – I do not know how many people (in the target group) reflect on their health issues so that they search for help. And if they do what is the trigger and what are their needs and where do they go for help. I myself, I call the doctor if I think there is something not good with my body; maybe after I searched on the internet to see what information I can find. We had a chat once with an internet doctor when we were abroad and needed some medical advice. When we have a lasting back pain we go to a doctor. The doctor says: it’s the age. Ageing is slowly dying. You have to live with that. Or die, if you want. We are getting older and I think we are not special.
Digital coaching app all over the place
There are digital personal diabetes coaches in the form of an app that runs on your mobile phone. Some of them have an embodied character that pops up when you want. They try to motivate you to measure your glucose regularly. There are physical activity coach apps. There are food coaches; sleep coaches, depression coaches; budget coaches. The most important function is that they help you to monitor your physical or financial condition in terms of some vital physiological or financial parameters and some of your daily health-related activities. And sometimes they can give you personal advice. Your values are either obtained from sensors connected to the app (step counter, glucose meter), from information you provide yourself by typing in values in form fields, from other applications you use. And sometimes from a short information dialog with your personal coach. But this is still a real challenge: to have a free, open, natural interaction in your own language with an artificial conversational agent that is really of your help.
I am rather sceptic about the potentials of human-virtual human-interaction. The value of apps is that they store relevant data, they can help you reflect on a specific health related issue: your weight, your diabetes. People that use such an app for a longer time are already motivated to keep an eye on their health condition. You data can easily be shared with your doctor. Older people forget things; it helps you remember. Ease of use and functionality is what counts. Should it be fun to work with an app? I think it is a nice added value. But it should not undermine the primary functions. For fun there are games.
Serious coaching games
Children like games. In one of the Human Media Interaction projects at Twente University we built a gamification platform to support young diabetes patients in dealing with their disease. In a journal paper we discuss what barriers we encounter on the path from design to the final implementation and inclusion in the current health ecostructure. Some elderly people like games as well. So why not design serious games that help people with their personal issues in a challenging way?
The very idea of the Council of Coaches project comes from the world of video games. Different coaches, covering expertice in various domains of life, chat about some issues relevant for and brought in by the user. This way the user need not be actively contribute to the conversation all the time. She can jump in whenever she likes.
One of the big challenges of the project is to get content for the health dialogs. How to feed the virtual coaches so that they are able to contribute in a sensible way to a conversation about the personal issues raised by a user? Maybe interaction and cooperation with real public coaching sessions can be of help.
Health Insight: Council of Coaches as Interactive TV format
On the list of Most Important Things For a Good Life a good health condition seems to be number one. But football is definitely second. The most popular and most awarded TV production in the Netherlands for many years already was Voetbal Inside. A council of four football (soccer) coaches discusses the most important issues of the week. TV watchers can drop a line via social media and ask questions, often addressed to one of the council members. The issue selected by the moderator is shown on screen and lively discussed by the different characters of the council (see Figure X). It is a mixture of spontaneous live and scripted interactive TV. Sometimes direct video communication with a guest/watcher is broadcasted.
Why not exploit Council of Coaches as an Interactive TV format. The Dutch Omroep MAX would be the first to be interested. They target the older segment of the Dutch TV watchers. They have close contacts with care institutes for elderly people. Most popular program of Omroep MAX is the cooking/bakery contest “Heel Holland Bakt’’ in which “normal’’ people take part. Omroep MAX might be interested to cooperate in a TV series where normal people can discuss their health related problems (focusing on a specific disease or general health related issue: diabetes or obesitas) with the Council of Coaches on TV. The Council consists of well-known medical experts and other “Well known Dutch’’ that suffer from the disease of the week. The recordings together with the feedback about engagement (e.g., audience ratings) can be used as training material for home-made artificial coaches. TV doctors we have already for quite some time but a council of coaches that discusses a statement like “Doctor, I have diabetes could it be because of stress in my work?’’ (see for an answer: https://www.dokterdokter.nl/gezondheid/tv-dokter/page/2/ ) could be a valuable addition. For the society it would be a welcome contra-weight against all those media productions and commercials that promote the food industry.
Figure 2 shows the Council of Coaches on TV. The picture on top shows a scene from the popular Dutch OmroepMAX TV production “Hendrik Groen’’ about a group of elderly people in an elderly house. The character on the right is Hendrik. The character on the left is his best friend Evert, a diabetic. Evert likes to drink, maybe a bit too much. He just had his leg cut off in hospital because of gangrene (necrosis caused by diabetes). At the bottom you see the question under discussion. The council members are known Dutch TV personalities and experts in food, diabetes care. They discuss the problem and conclude that it would be good that diabetics find out for themselves how their blood glucose values are affected by the alcohol consumption because. TV watchers can download the COACH app, that can be connected to their glucose meter. The app allows them to chat with their personal diabetes coach, a virtual replica of one of the council members (they lent their voice and style to the character). The coach gives them instructions how to perform the test and motivates them to adhere to the protocol for the duration of the test and to keep track of their alcohol consumption. Outcomes are sent to the COUCH server and shared with the audience in a following episode of Health Insight.
Spoken dialog with artificial characters: a real challenge
The problem of real-time automatic speech recognition is, and will remain, “close to being solved’’, thanks to Big Data and DNN technology. Real-time is a necessary requirement for spoken dialog that doesn’t suffer from processing delay and that allows realistic turn-taking and interrupting behaviour. One big problem is the recognition of special utterances, named entities, and newspeak. Data used for training machines is typically historical and outdated. Hence the need for continuous updates. As Hugo Brandt-Corstius – one of the founding fathers of Dutch research in formalisation of natural languages – used to say, “Wat je ook doet, de semantiek gooit roet.” (“Whatever you do, semantics bothers you”).
The generation of natural live-like speech is ready to be exploited by virtual characters (see: https://deepmind.com/blog/wavenet-generative-model-raw-audio/ ) so it is possible to have a personal coach with the voice of, for example, André van Duin (Evert in the Hendrik Groen TV series) or Trump to name a trustworthy figure.
An unsolvable paradox
But the core problem of an artificial dialog is the logic of the conversation. There is none. And if there is some logic it is the participants themselves who decide what it is. Not the designer. Trying to design a system for natural open dialog is trying to solve a paradox. Being a conversational partner you do not want to control the other party’s response. Of course when you ask a question you more or less force the other to respond in some way or another. But not in a deterministic way. That’s the whole idea of a question, isn’t it? Getting to know someone is different from asking all kinds of information about or from someone. The best realisable technical system offers the users a number of options. It also has a number of options the system can choose from to respond to a user’s action. These systems assume by desing that the world of conversations is closed; that there is something like a mathematical space of all possible conversations. That language is a system. I believe it is not. Autonomous agents ignore the users’ freedom, their identity and autonomy. The very concept of user already challenges these human values.
Modern projects deliver demonstrators. Project reviewers don’t like to read reports or scientific publications. Show me.
The Council of Coaches project built a functional demonstrator. The world of possible conversations is designed using a dialog editing system, called WOOL also developed in the project. So every conversation that a user and the coaches in the council have is a realisation of one of a huge set of possible paths of pre-scripted dialog continuations.
In another technical demonstration system embodied virtual 3D characters simulate realistic multi-party conversations. It demonstrates progress made and state-of-the-art in the development of turn-taking, addressing and argumentative dialog behavior generation for artificial conversational embodied 3D characters.
In his “Computer Power and Human Reason’’ Joseph Weizenbaum tries to understand how it is possible that people interact with his computer program ELIZA as if they are talking to a real human. The psychological value of interaction with artificial characters is also the theme of Sherry Turkle’s “Alone together”. As long as people recognize themselves in the answers given and as long as there is sufficient room for interpretation for the user her experience of being recognized by the system is strengthened.
Council of Coaches is a project that builds bridges between new media, art and design, and technology.
How will people stand towards virtual health coaches? Will they see them as personal coaches that they are willing to share their personal experiences and part of their life with? Do members of the council have to say only what we (or the medical people) believe is correct or do we allow bad or sceptic characters? Is the system seen as a medical instrument? Or is it a system that tries to make users aware of the way they stand towards their own life in whatever way that we believe that works?
Moreover, do users want to have private discussions with one of the virtual coaches? May a coach deceive the patient or withhold information because he believes it is not good for the client’s health to know? Old issues in medical ethics get a new dimension when coaching in the health domain becomes virtual. What is new is that many users are inclined to uncritically believe what the computer says: “The computer told me!”. These are some of the societal issues that have to be considered before the Council of Coaches will find its place in the social organization of patient-centered health care.
Future dream and worst scenarios
Some people prefer to talk about their personal health issues with a virtual character instead of talking to a human expert. Others are more or less forced by the health care system to first chat with an e-health coach before they see a real human. I am not sure if this is a healthy thing. Maybe the Council of Coaches can help the users to identify their personal issues and brake the barriers to talk to real humans.
A worst case scenario would be, when society decided to replace human coaches and experts by artificial agents because of the economic burden of a human good quality health care system for elderly people. Many people have the impression that western society is moving in the direction of this worsest scenario rules by policies that have unlimited trust in autonomous artificial intelligent agencies.
The Council of Coaches project has delivered a proof of concept and a software platform and tools that can be applied for building end user applications in other domains, for example for social skill training in professional organisations.
The real danger of autonomous agents
Regarding the discussion about “autonomous technology’’ (social robots, killer robots, autonomous cars, virtual coaches that would take the place of real humans): some people see a danger in the growing number and autonomy of intelligent machines that would take over the world. I believe the following makes more sense.
“The real danger, then, is not machines that are more intelligent than we are usurping our role as captains of our destinies. The real danger is basically clueless machines being ceded authority far beyond their competence.’’ (Daniel.C.Dennett, In: The Singularity—an Urban Legend? 2015).
Harm op den Akker et al., 2018. Council of Coaches – A Novel Holistic Behavior Change Coaching Approach. Proceedings of the 4th International Conference on Information and Communication Technologies for Ageing Well and e-Health
Sherry Turkle (2011). Alone together: why we expect more form technology and less from each other. Basic Books, New York, 2011.
Joseph Weizenbaum, 1976. Computer Power and Human Reason: from judgement to calculation. W. H. Freeman & Co. New York, NY, USA, 1976.
“I want to know why my friend, 69, was home with 104 fever, reported to his dr., test negative for flu, pneumonia, finally on day 5 was tested for Covid-19, 3 days later positive, told to stay home, and now is near death on ventilator in hospital? Why so long to test????”
(One of the many tweets that express that we are desperately looking for causes. From: Twitter 25-03-2020)
The SARS-CoV-2 virus is rapidly spreading over the world. Every hour the media come with news about the Corona pandemic; the numbers of deaths, of people tested positive, of people admitted to Intensive Care units. Every day we can follow debates between politicians, experts and the public how to handle the various problems caused by the virus. In the Netherlands we have to stay at home as much as possible, to wash hands regularly, to not shake hands and to keep a “social distance” towards one another of at least 1.5 meters. Schools, restaurants and many shops are closed. Elder people in community houses or living alone are among the most vulnerable. More and more countries decide to a complete lock-down.
Scientists, in particular virologists, epidemiologist, physicians explain their audiences the mechanisms behind the spreading of the virus to make clear the reasons for the measures taken by their governments (or to comment on them). Statisticians try to make sense out of the wealth of data collected by national and international health institutions. What questions can they answer based on their statistics?
People ask what the chances are they will be exposed to the virus when they go to the supermarket. Others want to know how long it takes before the pandemic is under control so that they can go to work and the children to their schools. But a lot is still unknown. The virus differs from those of known influenza epidemics.
A model on the individual level
National health institutions (in the Netherlands the RIVM) as well as international health organisations (in particular the WHO) gather and publish data about the numbers of infected people, as well as how many people died because of the virus. Researchers use this data to explore mathematical growth models to see if they can fit them to the data so they can be used to predict the virus spread, the peak, and when it will decline and how this depends on the policy. These models look on the level of the population of a whole nation, or a particular region (e.g. Lombardi) or even at a city (Wuhan). Sometimes they do look at the different age groups or gender differences. For example to predict fatality rates for different groups.
But these model do not look at the individual level. The causal model that I propose here differs from the epidemiological models. It models the factors that play a role in the effects of the virus on the individual level.
Monitoring Reproduction Number R0
An important societal quality to measure in case of a epidemic is the basic reproduction number of the virus R0 (R-naugth). Politicians and public health experts keep a close eye on this number. R0 represents the number of new infections estimated to stem from a single case. If, for example, R0 is 2.5, then one person with the disease is expected to infect, on average, 2.5 others. An R0 below 1 suggests that the number of cases is shrinking. An R0 above 1 indicates that the number of cases is growing. Of course the numbers counted are statistical measures over a population.
R0 depends on a number of factors. The number of people infected, the number of people vulnerable, the chances that a contact between people will make that the virus transfers from one to another person, how long people infected are able to affect other people. Some of these values are typical for a virus. Others can only be estimated from large populations. A definition of R0 refers to a scientific model of the complex mechanisms underlying the spreading of the virus. See this paper for the confusion about R0. All in all, it is hard to know the “real” value of R0. We can only estimate it.
Since spreading of the virus depends on the way people behave, policies try to steer peoples behavior. So they hope to control R0. In a similar way the doctor tries to influence a patient’s situation by application of some medical treatment. Since the doctor doesn’t know how the patient will react he will keep a close eye on the patient’s situation and adjust his treatment if required. A complicating factor is the time delay. A treatment will only have effect after some time. So in order to prevent critical situations we need to predict how the system that we try to control will behave in the future. So that we can adjust treatment in time. If you want to shoot a flying duck you need to estimate where it will be at the time the bullet will reach the point. All in all there are so many uncertainties, not in the least about how the public will respond in the long run to the measures taken by governments (e.g., to stay at home), and how this depends on expectations presented in the media. Maybe other values become more important after some time (e.g visiting family, go out for pleasure) so that people change their behavior.
Towards a causal Corona model
Many people get sick from the virus, some of them have only mild symptoms, a small percentage does not survive the attack. Most of the ones that die are older and already have health problems. But there are exceptions. It seems that not all people get the disease. This raises our first question.
What are the factors that determine if a person will get COVID-19 ?
For a person to get a disease caused by a virus infection two things are necessary and sufficient: (a) the person is vulnerable for the virus and (b) the person is actually infected by the virus after he or she was exposed to the virus.
This is almost trivial logic of the classical potency act doctrine. Likewise: for a glass to break, the glass has to be breakable (vulnerable for breaking) and there must be some actor that actually breaks it.
Our first questions raises two follow-up questions:
(2) What are the factors that determine a person’s vulnerability for being infected?
(3) What are the factors that determine that the person’s gets exposed and infected with SARS-CoV-2?
Some people die after they got the disease, others survive.
(4) Which factors determine how serious the disease will become and what are the factors that determine the chances of survival?
If we have an answer to these questions we can predict which people run most risk of getting ill and what should be the best policy to prevent or control outbreak of the virus. To find answers we have to collect data. The more data, the better. Suppose we want to know what the effect is of age on chances to die because of corona, among those that are affected by the virus. We need statistics: for every patient we need age group, and effect, where effect is either survived or died. We can now estimate the probability for each age group. We might expect that the older the people are the smaller the chances to survive. Suppose we see an unexpected dip in the curve after the age group above 70. How can we explain this? Our data doesn’t tell. Was there a policy to not treat patients of this age group? Data alone is not enough to answer these questions and to make good predictions. We need to know the mechanisms behind it.
A plea for causal graphs
In The Book of Why (2018) computer scientist and philosopher Judea Pearl argues for the use of causal diagrams as a computational instrument to make causal inferences from data. Pearl, who invented Bayesian Networks, points at the fact that we can’t draw causal conclusions from data alone. We need causal intuition from experts about the case at hand. With such a causal model we can simulate experiments by setting values of variables of the causal model and see what the effects are on the outcome variables that we are interested in. Also, if we want to answer questions like “What if Mrs. S., aged 71, of which we know that she became seriously ill after being exposed to the virus and that she was not taken in hospital, had been taken in hospital care? Would she have died?’’
What is a causal diagram?
A causal diagram is a graphical structure with nodes and arrows between pairs of nodes. The arrows represent direct causal relations. The source of the arrow is the cause, the target node represents the effect of the causal relation. Figure 1 shows a causal diagram that could be a part of a causal graph for the analysis and prediction of the effect of age, medical condition and treatment on chances to survive.
The diagram shows that Treatment depends on Age as well as on Condition. Treatment has effect on Lethality (chance to survive), but this is also dependent on Age and on Condition. Age influences Lethality through four different paths.
Causal diagrams can answer questions like “How does the lethality in the population change if we decide to give all infected people the same treatment independent of age? Or “How does the recovery rate for the corona disease change if we enforce a complete lock-down instead of the actual soft “intelligent’’ lock down?”.
Such questions can only be answered correctly by means of a causal intervention in the diagram. In technical terms: by applying the do-operator on the intervention variable. (We give the variable a fixed value and compute the effect). This simulates more or less what we would do in a randomized controlled experiment. We learn how Nature works by bringing about some controlled change and observe how Nature responds to that. The validity of the conclusions that we draw from such simulations requires that the model is correct, no arrow between two nodes if there is no causal relation, and complete in the sense that all relevant factors and relations are in the model.
How do we know if the model is complete and correct? I am not aware of any convincing publication about validation of causal diagrams. In fact, a causal diagram is a theory that can at best be falsified, not proven to hold the truth. Pearl’s Book of Why describes some interesting episodes that show that the truth consists in the historical process of scientific research. Not outside this process. Although it helps that now and then someone stubbornly believes that she has seen the light and fights against the community’s doctrine. Those are the ones that make science progress.
Causal Diagrams and Bayesian Networks
Nodes X and Y of a causal diagram represent factors that play a role in the model, with an arrow drawn from X towards Y if we conceive X as a “direct cause’’ of Y. Mathematically the factors X and Y are stochastic variables of various types: continuous, or discrete: ordered or categorical.
A causal network is a special type of Bayesian network. In a Bayesian network the arrows do not need to refer to direct causes. They stand for probabilistic “influence” without implying a causal direction. For example, if we know that someone lives in an elderly home we can infer that he or she is most likely over 70 years of age. There might not be a direct causal relation in a strict sense between age and living in an elderly home, but if we got the information that mrs S. lives in an elderly home chances are raising she is over 70 years of age and not 15. In a Bayesian network to each node Y is attached a (conditional) probability table for P(Y|z) where z is a set of all variables that point at Y (the parent nodes of Y). Thus when the network is X -> Y then node Y has a conditional probability table P(Y|X) and X has a table for the probability distribution P(X).
Where do the probabilities come from? Bayesian networks are quite tolerant about the source of the probability values: they are either computed from data (probabilities as relative frequencies) or based on expert opinions (probabilities as confidence measures of “belief states’’). For a comprehensive review about the use of Bayesian Networks in health applications refer to Evangelia Kyrimi et al. (forthcoming). One observation is that many researchers do not publish their network, nor do they comment on the design process.
Bayesian networks are used to compute (infer) the probabilities of some outcome variables given some known values of input variables (the observed ones). The computation is based on the classical Theory of Probability (Pascal, Fermat) and uses Bayes’ Rule for computing the unknown probabilities of variables.
Let me give an example. Let T be the variable that stands for the outcome of a corona test (its value can be positive or negative) and let D stand for the statement “patient tested has corona disease’’ (D=true or false). We know that there is some causal relation D -> T. In diagnostics we reason from test to disease, not in the causal direction. Suppose the test outcome is positive. Does the patient has the disease? Tests are not 100 % reliable and sensitive. We want to compute P(D=pos|T=pos), the chances that the patient has the disease, given that the test outcome is positive.
Bayes’ rule says:
P ( D=pos | T=pos ) = [ P ( D=pos ) / P ( T=pos ) ] * P ( T=pos | D=pos )
The second factor P ( T=pos| D=pos ), the probability that the test is positive, if we know that the patient has the disease, on the right hand side of the equation has known values. They are based on expert knowledge about the test mechanism explaining how the disease causes the test outcome. The first factor is a fraction: the numerator P (D=pos) is the prior probability, i.e., the chance that the patient has the disease before knowing the test outcome. The denominator P(T=pos) is the probability that the test is positive whatever the situation of the patient. There are small chances that the test shows a positive outcome in case the patient does not have the disease (a false positive).
Bayes rule is consistent with intuitive logic. As you can see from the formula on the right hand side of Bayes rule, the chances P ( D=pos | T=pos ) grow when the prior P (D=pos) grows. When almost everyone has the disease then chances that you have it when you have been tested positive are also high. On the other hand, the higher P( T = Pos ), i.e. the higher chances are that the test is positive (also for people that do not have the disease), the smaller the probability that a positive test outcome witnesses the occurrence of the disease.
When P( X | Y ) is not equal to P (X), we call X and Y probabilistically dependent. This dependency should not be confused with causal dependency. P(D=pos|T=pos) differs from P(D=pos) but having a disease is not caused by a test. Also correlation (a measure for co-occurrency) should not be confused with causation.
What counts as a causal relation?
As we said before, what makes a causal diagram a special (Bayesian) network is that the arrows in the network represent direct causal relations. But what is a causal relation? Where do they come from?
Almost every day we express our beliefs in the existence of a causal relation between phenomena. Either in an implicit way, e.g., “If you don’t hurry now, you will get late.’’, or, explicitly, e.g., on a package of cigarettes: “Smoking causes lung cancer.’’, or “Mrs. Smith died because of corona.’’. But if we come to think about it, it is not easy to tell what a causal relation is, and how we can distinguish it from other types of influencing between things that happen. The modern history of the idea of cause started with Aristotle’s theory of the four causes of being (see his Metaphysics). Only two of them have survived: the notion of efficient cause and the notion of final cause. In our modern mechanical scientific world view we see a cause as something that brings about some effect: a billiards ball that causes another ball to move and so on. We see chains of causes and effects and many causes that influence other things or processes. On a medical internet site I found “Smoking does not cause lung cancer’’. The motivation is: (1) not everybody who smokes gets the disease, (smoking is not a sufficient condition to bring about the effect) and (2) some people get lung cancer and never smoked (it is not a necessary condition to bring about the effect). Indeed, after many years of debates and research (see Pearl’s account of this episode in the history of science in The Book of Why), the causal mechanisms of cancer development under the influence of smoking have been largely unraveled. We know that the primary effect of smoking is that the tar paralyzes and can eventually kill the tiny hair-like structures in the lungs. Lung cancer happens when cells in the lung mutate and grow uncontrollably, forming a tumor. Lung cells change when they are exposed to dangerous chemicals that we breathe. The tar makes the lungs more vulnerable for these chemicals. In that sense smoking is indeed a cause of lung cancer. The search for the “real’’ cause will only end when we have found a plausible theory that describes the chemical mechanism of the change of the lung cells.
In general, it seems that as long as cause and effect are two separate objects or processes our wish to know how nature works is not really satisfied. A successful and satisfying search for the real cause of a given effect eventually reveals that cause and effect are actually two sides of the same mechanism that we tend to objectify. In reality cause and effect are materially one and the same process. But this only holds at the end of the search for causes. In making a causal diagram we must take for a cause everything that has a direct influence on (the status of) the effect.
To make a long story short: there is no agreement about what counts as a “cause’’, so we have to rely on our “causal intuition’’ and expertise in the relevant domains.
Need for expert knowledge
I am not an expert in one of the relevant areas, but I take the challenge to make a causal diagram based on the information that I gathered from the literature and the internet. I have some experience in the design of such models, but in quite different areas. In the past I gave courses in Artificial Intelligence devoted to Reasoning under Uncertainty. We used (Dynamic) Bayesian Networks for Natural Language Processing and for recognition of participant’s behavior in conversations. In making it I have learned how challenging it is to make such a diagram. The Dutch newspapers and the Dutch RIVM provide statistics and background articles that I use to make the diagram.
My causal corona diagram is just the basic structure of a Bayesian network: I do not attach (conditional) probability tables to the nodes. Computing probabilities is not my concern. I am interested in the causal structure. The main function is to organize and visualize the connections between the most important factors that play a role in the process of getting caught by the corona virus.
The Corona Diagram
The core of the diagram is based on the simple idea that a subject will get sick if and only if two things hold: 1) the subject is vulnerable and therefore sensitive for being infected and 2) he or she is exposed and actually infected by the virus. This is as trivial as cheese and almost tautological. But therefore it is not less truthful.
Here is the kernel of the causal diagram in which Seriousness is the label for the variable that indicates how serious the disease of this person is.
Now that we are in the middle of the pandemic the public policies focus on minimizing the chances of exposure and infection. Several factors influence a person’s vulnerability for viruses and infections in general. We currently do not have a complete picture of which factors are typical of influence for sensitivity for this new SARS virus.
SeriousNess: indicates how seriously ill the condition of the patient gets. Values varies from Asymptomatic, Mild, Serious to VerySerious.
It is directly influenced by two factors:
Vulnerability : indicates how sensitive the patient is for the virus. It refers to the subject’s physical condition. Other relevant conditions belong to the second cluster of nodes.
Infection : Infection is when the virus has entered the body. It ranges from nill to a few to many.
When there is no Infection SeriousNess is nill. SeriousNess is determined by the degree of Infection and by Vulnerability. The factors that determine Vulnerability should answer the questions why this young man became seriously ill and needs hospital care where this woman shows only mild symptoms. Recent research indicates that Infection is a gradual thing (a person can be disposed to a few or many viruses) and that SeriousNess depends in a more complex way on the Infection grade (Or type? It might be that difference variants of the Corona virus make a difference here.)
Lethality indicates the chances to die. It depends on the Seriousness and the MedicalCare, i.e. the medical treatment, offered. There is no cure at the moment; only care can help the patient to survive.
Note: It is often difficult to say what the exact cause of death is. Statistics about populations may reveal how many deaths are the effect of the corona virus.
Infection is caused by Exposure (is the subject exposed to virus) and influenced by SelfHealthCare (did subject take hygienic precautions; wash hands, etc.). SelfHealthCare is important to prevent Infection. That is why it is urged to wash hands with soap or alcohol regularly and not to touch the face.
Vulnerability and Infection are the two main factors. Both are necessary for becoming ill. As you can see there is no direct relation between the two, i.e. Vulnerability does not cause Infection nor the other way around. However from data collected it will come out that there is a high correlation between the two, suggesting a causal relation. This will show for example when we collect data from medical staff taken into hospital care after they have been tested positive. We cannot conclude from this that in the general public there is a causal relation between the two. This “spurious’’ causal relation only holds for people that are actually infected. (See for an explanation of this phenomenon Pearl’s discussion of Berkson’s paradox in The Book of Why.)
What are the factors that influence these two main factors?
This is influenced by:
HealthCondition: the physical condition. Some people say that the people that die after being Infected would die anyway within a year or so because of their bad health condition.
ChronicCondition: chronic diseases: lung diseases, coronary, heart, diabetes. There are indications that a high percentage of patient taken in hospital have a too high BMI (body mass index).
Gender: 2/3 of the patients that passed away after getting serious ill are male.
GeneticFactors: other genetic factors besides gender specific ones may play a role. The COVID-19 Host Genetics Initiative (2021) reports that “While established host factors correlate with disease severity (e.g., increasing age, being a man, and higher body mass index) these risk factors alone do not explain all variability in disease severity observed across individuals. The genetic makeup of an individual contributes to susceptibility and response to viral infection.
“This international initiative describes the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. They report 13 genome-wide signifcant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19.
ImmunizationStatus: it is currently not known how the immune system responds to the virus. Several vaccins are developed and have been applied since the end of 2020. It is to be expected that people that recovered have lesser chances to become ill again. Current research seems to indicate that people that had only mild symptoms are not completely immune for the virus.
Age: is directly related to a number of other factors. Age is also a factor in the Exposure cluster. Age is more a property that can be used as an indicator for certain inferences. Ageing as a proces influences many aspects of peoples’ life: work, social contacts, etc. (See below for a discussion about ageing as multi-factor confounder).
A subject can be exposed to the virus to several degrees dependent on the concentration of the virus in the locality where exposure occured and the duration of the contact. Exposure is determined by several factors: mainly by social contacts and activities (type of work, hobbies). Another factor is
VirusSpread: how much is the virus spread over the subject’s social environment?
If VirusSpread is high and SocialContacts is also high then chances for Exposure are high as well.
It is difficult to get a good picture of the spread of the virus. Only a limited number is tested for the virus. Conclusions drawn from the data collected are most likely biased. Also, different countries have different policies, which makes merging data a real challenge. Even within the Netherlands different regions have different policies dependent on the test capacity available. See below for a recent publication about a simulation study of the spreading mechanisms of the virus.
Personality: influences SocialContact (See also the comment about compliancy in the notes below.)
Household: apart from social contacts the direct contact with family members influence the chance to be exposed to the virus.
Activities: it makes a difference if the person lives on his/her windowsill or is an active participant in all kinds of social events and organizations.
How can a Corona diagram be used?
The causal diagram presented is the result of an exercise in making a causal diagram for a realistic case. There is need for expert knowledge to come up with a better model. It can be completed with probabilities based on available data.
Eventually, a causal diagram like this for the Corona pandemic can be used to answer several questions, not just probabilistic questions: what are the effects of certain observations on the occurrence of other events? But also causal questions: what is the effect of changing values on outcome variables? It can also be used to answer counterfactual questions on an “individual’’ level.
The diagram can also be used for qualitative analyses. How is Age related to Lethality? The diagram shows that the influence of Age goes through Sensitivity. Ageing makes people more sensitive for diseases. So if they are infected they run a risk for getting in a serious condition. If there is not enough medical care for the subject chances raise that he or she will die.
What is the impact of gender on lethality? Data about the incidence of COVID-19 indicates that there is a high correlation between Gender and Seriousness. This suggest a direct effect of Gender on Vulnerability, i.e., that male subjects are more sensitive than female subjects. Statistics show that 2/3 part of all patients are male. But other genes of the human genome may play a role as well.
Health care workers, doctors and nurses in hospitals are more vulnerable than, e.g., computer programmers. The former run a higher risk of becoming exposed and infected than the latter group. This is reflected in the Activities node in the diagram.
What is the impact of the ecological environment on chances to get ill? The influence of the environment is indirect via HealthCondition. Polluted air causes chronic lung diseases such as COPD which makes people more vulnerable. The risks of getting ill are higher in large urban areas (e.g., Wuhan, Madrid, North Italy, New York) than in traditional agricultural areas. People living in urban areas run more risk to get exposed because of the heavy use of public traffic.
Finally, such a diagram might also help in deciding for or to explain policies that aim at minimizing exposure for specific subgroups of citizens. Isolation of elderly people helps because it affects Exposure, a necessary condition for getting infected.
A note on confounding factors: age
Our diagram has a simple tree-like structure. It has two main branches that are not connected upwards: Seriousness and Infection are not directly connected. Each of the two branches have a tree-like structure themselves.
This implies that Seriousness and Infection are not related. This would be the case if the two branches had a confounding cause. A confounder C of X and Y is simply said a common cause of X and Y. Pearl gives in The Book of Why (2018) a better, mathematical, definition in terms of his do-operator. Since Age influences in fact not only Vulnerability but also Exposure via SocialContacts (not indicated in the diagram) Age is a confounder.
Pearl(2018) discusses a study among retired men that revealed an association between regular walking and reduced dead rates. Is there a causal relation between the two? According to Pearl, the experimenters did not prescribe who would be a casual walker and who would be an intense walker. So we have to take into account the possibility of confounders. “An obvious confounder might be age”. The causal diagram is shown below.
Confounding factors cause spurious correlations between other variables. If we want to know the effect of Walking on Mortality we should do that for fixed age groups. The situation would be different when Walking had a causal influence on Age, which is not the case (well, maybe it is).
Some other studies: testing for the virus and mortality
A recent analyses using collected datasets of over 17 million adult NHS patients of which a bit less than 5700 died attributed to COVID-19, confirms the relevance of most the factors in our model that determine chances on death caused by the Corona virus. The study reports (version 7th May 2020) numerical values for the impact of these factors on lethality: age, sex, medical conditions (chronic diseases), deprivation. Race is a factor: “Compared to people with ethnicity recorded as white, black people were at higher risk of death“. Behavioral factors, as the ones in our model, have not been studied. A nice causal diagram is missing in the report.
Some epidemiologists are active on twitter and discuss causal models. One of them is Ellie Murray. She likes to illustrate her model and ideas with nice handmade infographics. See figure below.
Besides the personal question “How are chances for this particular person that he or she will be infected by the virus and survive?’’, there are questions concerning the effect of policies as well as compliancy on the spread of the virus and on mortality. Other researchers focus on these and on the global effect of the pandemic on mortality in the general public. How much does the virus shorten life expectancy?
David Spiegelhalter (2020) discusses the issue of whether many deaths from COVID-19 would have occurred anyway as part of the `normal’ risks faced by people, particularly the elderly and those with chronic health problems who are the main victims of COVID. The current – 25-03-2020 – estimated infection fatality rate (percentage death among infected subjects) is anywhere between 0.5% and 1%.
Norman Fenton et al.(2020) illuminate the need for random testing to prevent bias due to the selection methods used now. They advocate the use of a causal model for the analysis of the results. They present an example of a causal model for a given country and its population in order to show that the COVID-19 death rate is “as much a function of sampling methods, testing and reporting, as it is determined by the underlying rate of infection in a vulnerable population.’’
Wilder et al.(2020) develop an agent based model for studying the spread of the SARS-CoV2 virus. “A key feature of the model is the inclusion of population-specific demographic structure, such as the distributions of age, household structure, contact across age groups, and comorbidities.’’ The aim of this study is “to evaluate the impact of age distribution and familial household contacts on transmission using existing data from Hubei, China, and Lombardy, Italy –two regions that have been characterized as epicenters for SARS-CoV2 infection –and describe how the implications of these findings may affect the utility of potential non-pharmaceutical interventions at a country-level.’’
Motivated by Judea Pearl’s The Book of Why in which he advocates the use of causal diagrams, interested in the mechanisms that play in the Corona disease, not disturbed by any expert knowledge in virology, epidemiology, or whatever relevant domain, I made a causal diagram. I hope I have explained what it is, what it can be used for and why it would be good to work on a better one.
Rieks op den Akker, Lonneker, March 2020
COVID-19 Host Genetics Initiative (2021). Mapping the human genetic architecture of COVID-19. Nature (2021).
Wilder, Bryan and Charpignon, Marie and Killian, Jackson and Ou, Han-Ching and Mate, Aditya and Jabbari, Shahin and Perrault, Andrew and Desai, Angel and Tambe, Milind and Majumder, Maimuna, The Role of Age Distribution and Family Structure on COVID-19 Dynamics: A Preliminary Modeling Assessment for Hubei and Lombardy (March 31, 2020). Available at SSRN: https://ssrn.com/abstract=3564800 or http://dx.doi.org/10.2139/ssrn.3564800
Kyrimi, E., McLachlan, S., Dube, K., Neves, M.R., Fahmi, A., & Fenton, N.E. (2020). A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future. ArXiv, abs/2002.08627.
Rieks did some mathematics and computer science. He has a PhD in computer science. He moved to dialogue systems research and natural language processing. He was Assistant Professor Artificial Intelligence and Human Computer Interaction at the University of Twente. He designed and used Bayesian Networks for modeling and prediction of conversational behaviors. He lectured logic and statistical inferencing in AI courses focusing on reasoning with uncertainty.
 Real individuals do not occur in scientific models. Science is about categories. Individuals are abstract entities represented by a list of attribute/value pairs. In a counterfactual question to the model we change certain values of some individual to see what the effect would be had she had this property instead of the one that she has in the real world.
 Comment by Miriam Cabrita. Related to Exposure, I miss how the community (excluding the individual self) comply with the measures proposed to stop the virus. For example, one way or another everybody needs to do shopping. Let’s assume that an individual goes to the supermarket and takes all precautions given by the government and even more (e.g. washes hands carefully when arrives home, removes shoes). If the people in the community are sloppy (not to say stupid), and keep going to the supermarket while sick, the chances that the individual is Exposed to the disease are much higher than in a community where everyone respects the rules. What if the people you live with do not comply with the rules of SelfHealthCare? Then you are also much more exposed to the virus.
Response by Rieks: Compliance is indeed a factor that affects VirusSpread. It maybe affected by Personality, i.e., some people are more compliant to authority regulations than others (youngsters, elderly, for different reasons). You need to know compliance to treatment to know the effects of treatment. You need to know if people were free to choose for or assigned to a treatment, since that affects compliancy. These are relevant issues in the current discussion in the Netherlands about the introduction of a corona app to trace, detect and warn people for corona. What can we do with the data collected?
Most people involved in car accidents have a driver’s license
Has Simpson’s paradox anything to do with causality as Judea Pearl claims in The Book of Why ? In this book the computer scientist and philosopher of science describes the historical development of a mathematical theory of causation. This new theory licenses the scientist to talk about causes again after a period in which she could only report in terms of correlations. Will the Causal Revolution, in which Pearl playes a prominent role, eventually lead to a conversational machine that passes the Turing test?
The strange case of the school exam
A school offers courses in statistics. Two Professors are responsible for the courses and the exams. The contingency tables below show statistics about the students exam results in terms of passed (Positive) or not passed (Negative) for each of the two Professors.
The school awards the Professor with the best exam results. Professor B claims the award pointing at the first table. This table shows indeed that the relative frequencies of passing are higher for Professor B (2% negative result) than for Professor A (3% negative result).
Professor A objects against B’s claim. It was recorded which students were well prepared for the exam, and which were not. He compiled a table for the segregated results. Indeed, this second table shows that for both student categories the results of Professor A are better than for those of Professor B.
Which Professor wins the award?
The statistics in the aggregated table shows clearly that for the whole group of students prof B has better results than prof A, but for both subgroups of students it is reversed: prof A is better than prof B.
How is this possible?
This surprising outcome of the statistics exams is my favourite instance of Simpson’s paradox. The paradox is well known among scholars and among most students that followed a course in statistics. I presented it my students in a lecture to warn them for hidden variables. I have surfaced my slides again when I was reading Judea Pearl’s discussion of the paradox in The Book of Why.
Beyond statistics: causal diagrams
After he introduced Bayesian Networks in the field of Artificial Intelligence, Pearl invented causal diagrams and developed algorithms to perform causal inferences on these diagrams. In The Book of Why Pearl presents several instances of Simpson’s paradox to clarify that we cannot draw causal conclusions from data alone. We need causal information in order to do that. In other words: we need to know the mechanism that generated the data.
Causal diagrams are mathematical structures, directed acyclic graphs (DAGs) in which the arrows connecting two nodes represent a causal relation, not just a probabilistic dependency.
Figure 1 shows two possible causal diagrams for the case of the school exams.
Both networks can be extended to a Bayesian network with probabities that are consistent with the statistics in the tables. In both models the Professor and the Student, represented by the node labeled Prepared, are direct causes of the exam result, represented by the node labeled Passed. The diagrams differ in the direction of the arrow between the Prof node and the Prepared node. In the diagram on the left the causal direction is towards the Prof node; in the diagram on the right the cuasal direction is towards the Prepared node: the Professor determines how well students are prepared for the exam.
If the latter model fits the real situation the school should award Professor B. The decision should be based on the table with the combined results. The better exam results are the Professor’s credit.
The diagram on the left models the situation in which the preparedness of the students somehow determines the Professor. In this case the school could award Professor A based on the results in the lower, segragated, table.
What has Simpson’s paradox to do with causality?
What makes Simpson’s paradox a paradox? There has been some discussion about this in the statistical literature. Simpson himself gives two examples of the phenomenon. One is about the chances of survival after a medical treatment where the contigency tables show that the treatment is good for males as well as for females but valueless for the race. Of course, such a treatment cannot exist. But what should we conclude from the tables? Again, the answer depends on the underlying mechanism, that can be represented by a causal diagram. Simpson suggests that the “sensible interpretation” is that we use the segregated results for the genders. It is a bit strange, indeed, to assume that the treatment affects the patient’s gender.
Pearl distinguishes between Simpson’s reversal and Simpson’s paradox. He claims that Simpson’s paradox is a paradox because it “entails a conflict between two deeply held convictions”. Notice that also in case there was no reversal different causal diagrams are possible.
Why does Simpson’s paradox reveal?
In Causality(2003) Pearl introduces the paradox in terms of conditional probabilities.
“Simpson’s paradox refers to the phenomenon whereby an event C increases the probability of E in a given population p and, at the same time, decreases the probability of E in every subpopulation of p. In other words, if F and ~F are two complementary properties describing two subpopulations, we might well encounter the inequalities
P(E | C ) > P(E | ~C)
P(E | C,F) < P( E | ~C,F)
P(E | C,~F) < P(E | ~C,~F)
“Although such order reversal might not surprise students of probability, it is paradoxical when given causal interpretation.’’ (Causality, p.174; italization is mine)
From the first inequality we may not conclude that C has a positive effect on E. The effect of C on E might be due to a spurious confounder, e.g., a common cause of E and C.
In our example of Simpson’s paradox we could estimate conditional probabilities P(Passed|Prof) from the contingency tables.
From the inequality
P(Passed=True|Prof = A) > P(Passed=True| Prof=B)
derived from the combined table we could conclude that the Professor has a causal influence on Passed, i.e. on the exam results. If we do this we give the inequality a causal interpretation. And this is clearly wrong! There could be other mechanisms (confounders) that make Passed dependent on Professor.
Why is Simpson’s reversal surprising?
Consider the following statement.
If a certain property holds for all members of a group of entities then that same property also holds for all members of all subgroups of the group and vice versa.
This seems to me logically sound. It holds for whatever property. The statement differs from the following.
If a certain property holds for a group of entities then that same property also holds for all subgroups of the group and vice versa.
The second one is about properties of aggregates. This is not a sound logical rule. It depends on the property if it holds truth.
If a student sees the contigency tables of the school exams and notices the reversal he might perceive this as surprising and see it as contradicting the first statement.. On second thought, he might notice that it is not applicable: there is no property that holds for all students. The student might think then that it is contradicting the second statement. But then he realizes that this is not sound logic. Simpson’s paradox makes him aware that the second rule, the one about aggragates does not apply here. The reason is that the property is not “stable’’. The property changes when we consider subgroups instead of the whole group. The property is a comparison of relative frequencies of events. In our example:
6/600 < 8/600 and 57/1500 < 8/200
and for the merged group it holds that:
(6+57)/(600+ 1500) > (8+8)/(600+200)
The abstract property hides, in a sense, the differences that occur in the underlying relative frequencies. The situation is like winning a tennis match: a player can win the match although her opponent wins most of the games. The outcomes of the games are hidden by counting the number of sets that each of the players wins. With set scores 6-5, 0-6 and 6-5 player A wins 2 sets to 1, but player B wins with 16 games to 12.
Indeed, “Simpson’s reversal is a purely numerical fact”.
What has Simpson’s paradox to do with causality?
Pearl’s claims that for those who give a physical causal interpretation of the statistical data, there is a paradox. “Causal paradoxes shine a spotlight onto patterns of intuitive causal reasoning that clash with the logic of probability and statistics” (p.190).
In The Book of Why he writes that it cost him “almost twenty years to convince the scientific community that the confusion over Simpson’s paradox is a result of incorrect application of causal principles to statistical proportions.”
It looks like it depends not only on the rhetorical way an argument is brought but also on the receiver if an argument or construct is perceived as a paradox.
The heading “ Most people involved in car accidents have a driver’s license’’ is conceived as funny by the reader in as far as it suggests for the reader a causal relation, i.e. that having a driver’s license causes car accidents.’’
How would a student of the Jeffrey’s and Jaynes’ school, i.e. some one who has an epistemological concept of probability perceive Simpson’s paradox?
When I saw Simpson’s paradox for the first time I was surprised. Why? Because of the suggestion the tables offer, namely that they tell something about general categories. Subconsciously we generalize from the finite set of data in the tables to general categories. If we compute (estimate) probabilities based on relative frequencies we in fact infer general conclusions from the finite data counts. The probabilities hide the numbers. In my view the paradox could very well be caused by this inductive step. We need not interpret probabilistic relations as causal to conceive the paradoxical character.
What are probabilities about?
At the time I was a student, probability theory and statistics was not my most popular topic. On the contrary! My interest in the topic were waken up when I read E.T. Jaynes’ Probability Theory. Jaynes is an out and out Bayesian with a logical interpretation of the concept op probability. According to this view probability theory is an extension of classical logic. Probabilities are measures of the plausibility of a statement expressing a state of mind. P(H|D) denotes the plausibility of our belief in H given that we know D. I use H for Hypotheses and D for Data. P(H|D) can stand for how plausible we find H after having observed D. Bayes’ rule tells us how we should update our beliefs after we have obtained new information. Bayes’ rule is a mathematical theorem within probability theory. It allows us to compute P(H|D) from P(D|H), the probability of D given some hypothesis, and P(H), the prior probability of H.
Jaynes warns his readers to distinguish between the concept of physical (or causal) dependency and the concept of probabilistic dependency. Jaynes theory concerns the latter, epistemological (in)dependencies, not causal dependencies.
Neither involves the other. “Two events may be in fact causally dependent (i.e. one influences the other); but for a scientist who has not yet discovered this, the probabilities representing his state of knowledge – which determines the only inferences he is able to make – might be independent. On the other hand, two events may be causally independent in the sense that neither exerts any causal influence on the other (for example, the apple crop and the peach crop); yet we perceive a logical connection between them, so that new information about one changes our state of knowledge about the other. Then for us their probabilities are not independent.’’ (Jaynes, Probability Theory, p. 92).
Jaynes’ Mind Projection Fallacy is the confusion between reality and a state of knowledge about reality. The causal interpretation of probabilistic relations is an instance of this fallacy. Logical inferences can be applied in many cases where there is no assumption of physical causes.
According to Pearl the inequalities of Simpson’s paradox are paradoxical for someone who gives them a causal interpretation. I guess Jaynes would say: the fact that these inequalities hold shows that we cannot given them a causal interpretation; they express different states of knowledge. You cannot be in a knowledge state in which they all hold true.
But how would Jaynes resolve the puzzle of the school exam? Which of the two Professors should win the award? Jaynes was certainly interested in paradoxes, but he didn’t write about Simpson’s paradox, as far as I am aware of. I think, he would not consider it a well-posed problem. Jaynes considered the following puzzle of Bertrand’s not well-posed:
Consider an equilateral triangle inscribed in a circle. Suppose a chord of the circle is chosen at random. What is the probability that the chord is longer than a side of the triangle?
Bertrand’s problem can only be solved when we know the physical process that selects the cord. The Monty Hall paradox discussed by Pearl, is also not well-posed, and hence unsolvable, if we don’t have information about the way the quiz master decides which door he will open. The outcome depends on the mechanism. Jaynes and Pearl very much agree on this. Jaynes relies on his Principle of Maximum Entropy to “solve” Bertrands’paradox. I don’t see how this could solve the puzzle of the school exam. Somehow Jaynes must put causal information in the priors.
How can Jaynes theory help the scientist in finding if two events are “in fact causally dependent’’ when probabilities are about the scientist’s “state of knowledge’’ and not about reality? After all scientist aim at knowledge about the real causes. We are not forbidden, Jaynes says, to introduce the notion of physical causation. We can test any well-defined hypothesis. “Indeed, one of the most common and important applications of probability theory is to decide whether there is evidence for a causal influence: is a new medicine more effective, or a new engineering design more reliable?’’ (Jaynes, p.62).
The only thing we can do is compare hypothesis given some data and compute which of the hypothesis best fits the data. Where do the hypothesis come from? We create them using our imagination and the knowledge we have already gained about the subject.
The validation of causal models
Causal diagrams are hypothetical constructs designed by the scientist based on his state of knowledge. Which of the two causal diagrams of school exam case fits the data best? We have learned that we cannot tell based on the data in the contingency tables: both hypothetical models fit the data. Gathering more data will not help us in deciding which of the two represents reality. We can only decide when we have extra-statistical information, i.e. information about the processes that made the data. Jaynes advocates the use of his principle of maximum entropy when we have to make a choice for the best prior. But the causal direction is not testable by data. So I do not see how this can solve the school’s problem.
But how does Pearl justify the causal knowledge presented in a causal model? How can we decide that this model is better than that one? The hypothetical causal models are in fact theories about how reality works. We cannot evaluate and compare them by hypothesis testing. Data cannot decide about causation issues. How do we validate such a theory then? It seems that we can at best falsify them.
Pearl doesn’t give an explicit answer to this critical question in The Book of Why. The answer is implicit in the historical episodes of scientific inquiries that he writes about; the quests and quarrels of researchers searching for causes. If there is something like the truth, it is in these historical dialectical processes. Not outside this process. Although it helps that now and then someone stubbornly believes that she has seen the light and fights against the establishment’s doctrine. Those are the ones that make science progress. The Book of Why contains a few examples of such stubborn characters. To quote Jaynes: “In an field, the Establishment is seldom in pursuit of the truth, because it is composed of those who sincerely believe that they are already in possession of it.” (Jaynes, p.613). Eventually, it is history that decides about the truth.
The Big Questions: Can machines think ?
In the final chapter of The Book of Why Pearl shares some thoughts about what the Causal Revolution might bring to the making of Artificial Intelligence. “Are we getting any closer to the day when computers or robots will understand causal conversations?’’ Although he has the opinion that machines are not able to think yet, he believes that it is possible to make them think and that we can have causal conversations with machines in the future.
Can we ever build a machine that passes the Turing test, a machine that we can have an intelligent conversation with as we have with other humans? To see what it means to build such a machine and what this has to do with the ability to understand causality, consider the following two sentences (from Terry Winograd, cited in Dennett (2004)).
“The committee denied the group a parade because they advocated violence.’’
“The committee denied the group a parade because they feared violence.’’
If a sentence like these occurs in a conversation with a machine it must figure out the intended referent of the (ambiguous) pronoun “they”, if it will be able to respond intelligently.
It will be clear that in order to do this, the machine must have causal world knowledge, not just about a few sentences, or about some “part or aspect of the world’’ (which part or aspect then?). Such a machine might also be able to see the pun in “Most drivers that are involved in a car accident have a driver’s license.’’.
I worked for quite some time in the field of Natural Language Processing, building dialogue systems and artificial conversational agents. We haven’t succeeded up to now in making such machine, although results are sometimes impressive. Will we ever be able to build such a machine? It is an academic issue often leading to quarreling about the semantics, something that Turing tried to prevent with his imitation game.
What about responsibility?
What is not an academic issue, but a real practical one, is the responsibility that we have when using machines; computers and robots that we call intelligent and that we assign more and more autonomy and even moral intelligence.
I end my note about Simpson’s paradox that became a sort of review of Pearl’s The Book of Why, with emphatically citing another giant in the philosophy of science, Daniel C. Dennett.
“It is of more than academic importance that we learn to think clearly about the actual cognitive powers of computers, for they are now being introduced into a variety of sensitive social roles, where their powers will be put to the ultimate test: In a wide variety of areas, we are on the verge of making ourselves dependent upon their cognitive powers. The cost of overestimating them could be enormous.’’ (D.C. Dennett in: Can Machines Think?).
“The real danger is basically clueless machines being ceded authority far beyond their competence.” (D.C.Dennett in: The Singularity—an Urban Legend? 2015)
Great books are books that make you critically reflect and revisit your ideas. The Book of Why is a great book and I would definitely recommend my students to read it.
E.T. Jaynes (2003) Probability Theory: the logic of science. Cambridge University Press, UK, 2003.
Judea Pearl(2001) Causality: models, reasoning, and inference. Cambridge University Press, UK, reprint 2001.
Judea Pearl and Dana Mackenzie(2019) The Book of Why: the new science of cause and effect. First published by Basic Books 2018. Published by Penguin Random House, UK, 2019.
Stuart Russell and Peter Norvig(2009) Artificial Intelligence: A Modern Approach, 3rd edition. Published by Pearson, 2009.
E.H. Simpson(1951) The Interpretation of Interaction in Contingency Tables. Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 13, No. 2 (1951), pp. 238-241. Published by: Blackwell Publishing for the Royal Statistical Society. Stable URL: http://www.jstor.org/stable/2984065
D.C. Dennet is a scientific philosopher of mind. He is deeply concerned with questions about the human mind, consciousness, free will, the status of man and machine in the world of creatures. The stored program computer is a key concept in his philosophy. (For a bibliography of D.C. Dennett see https://adamwessell.wixsite.com/bioml/daniel-dennett ).
“Thinking is hard.” We think a lot but we often follow paths that mislead us from the truth. In “Intuition pumps” (2013) Dennett collected a large number of stories, thought experiments that he developed in order to think properly to find answers to nasty questions. Reading the book is a good way to introduce yourself into the rich world of one of the most important thinkers of today. I follow Dennett from the time he published The Mind’s I (1981) together with D. Hofstadter..I got a copy from my students when I left high school, where I teached mathematics and physics. I returned to the university where I graduated four years later on a mathematical theory about the implementation of programming languages.
As I said, the computer plays a key role in Dennett’s thinking. I always felt that there is something wrong with the way he explains how the computer works, but it was always hard for me to understand what it exactly was and, most importantly, how I could understand how this fits in his philosophy. In this essay I try to explain where I believe Dennett is missing an important point when he explains “where the power of the computer comes from”.
According to Dennett we do not need “wonder tissues”, to explain the working of the human mind. When we understand what a computer can do and when we see how computers work we will eventually see that we do not need to rely on “magic” to understand the human mind. Electronic circuits can perform wonderful things.
He explains his students how the computer works in order to unveal the secrets of the power of the machine. By showing the students where the power of the computer comes from he tries to make clear that the evolution of the machine eventually may lead to a computer that equals the power of the human mind.
Where does the power of the computer come from? Or how does
a computer work?
Difficult questions. For me at least. From the time I was a student (I studied mathematics and computer science in the 70s at the University of Twente in the Netherlands) these questions kept me busy. How do we have to think properly to find an answer? I read many texts that describe the working of the computer. I taught students how to program computers in various types of programming languages. I taught them in “Compiler Construction” courses how to implement higher order programming languages. I programmed computers in order to allow people having a conversation with the computer in Dutch or English. I gave courses in formal language theory, mathematical logic, machine learning, and conversational analyses. I taught my students to program a Universal Turing Machine or Register Machine, the basic mathematical models of the stored program computer, precursors of all modern computers.
But I always felt that being able to program a computer, and being able to teach others how to program a Turing machine or a Register Machine does not mean that you can give a satisfying answer to the question: how does a computer work?
From Louk Fleischhacker, my master in Philosophy of Mathematics and Technology, I learned that a satisfying answer to the question how the computer works is hard to give without understanding mathematics, without understanding what it means to compute something. The computer would not be possible without a fundamental idea in metamathematics: that the language of arithmetics can be constructed as mathematical structure itself and that the arithmetical and logical operations can be formalized as operations on a formal language. This language becomes the interface, a programming language, to the mathematical machine. It is not for nothing that many people answer the question what mathematics is by saying that it is a special language. When we make a computation we manipulate tokens according to rules that we have learned.
There are at least two types of answers to the question how
a computer works.
There is the technical answer, of the type that Dennett gives. He explains in a very clear way how the register machine works by showing and teaching his students how to program the register machine. This machine is programmed using a very simple programming language: it has only three types of instructions. Step by step he explains what the machine does with the instructions. After he has explained how the machine can be programmed to add two numbers he asks his reader to be aware of the remarkable fact that the register machine can add two numbers without knowing what numbers are or what addition is. (I emphasize “without knowing” because it is a central idea in Dennett’s thinking: many creatures show intelligent behavior “without knowing”.) Animals show very intelligent behavior too, at least to me, but does that mean they ‘know’ what they do? A tough question.
Technical answers like this never satisfied me. They do not explain what we exactly mean by phrases like “what the machine does”. I had the feeling that something essential is missing in the technical explanation. Something that remains implicit, because it is so trivial.
As an answer to “how does a computer work?” I sometimes gave my students the following demonstration.
I hold a piece of paper for my mouth and I shout “Move!”. The moving of the paper I then explained by saying: “you see, the paper understands my command.” In a sense (Dennett would say “sort of” understands!). In what sense? Well, the meaning of the word confirms the effect of the utterance of the word: the paper moves as if it understands what I mean with uttering the word. This is an essential feature of the working of the computer. Note that the movement of the piece of paper is conditional on my uttering of the word. There is a one-to-one correspondence between the meaning of the word and the effect of uttering it. Of course computers ‘understand’ many words and sentences. But the correspondence between the physical proces and the mental proces that we implemented is the same as in this simple demonstration.
If we use a computer as a word processor, it is essential that we recognize the physical tokens on the screen as words of our language. This is so trivial that we simply forget this important assumption. Of course the machine is designed this way. If the machine sounds: “please insert your card”, we will recognize the words, spoken and understand it as a request to do what it asks us to do. At least, when it is said in the proper context of use.
The computer is a “language machine”. You instruct it by means of a (programming) language. The hardware is constructed in such a way that the effect of feeding it with the tokens satisfies the meaning that the tokens have. Therefore the programmer has to learn the language that the machine “sort-of” understands. The program is the key, the machine is the lock that does the work when handled with the proper key.
What has this to do with mathematics? Well; what is typical for mathematics is that mathematical expressions have an exact and clear meaning: there is no vagueness. There is a one-to-one correspondence between the effect of uttering the word and the physical effect caused by it, that represents the meaning of the word.
A demonstration I gave people in answer to the question “how does a computer compute the sum of two numbers?” runs as follows. By way of an example I demonstrate how a computer computes 2 plus 3. First I put 2 matches in one basket. Then I put another 3 matches in a second basket. Then one by one I move the three matches from the second basket to the first one. And look: the result can be read off from the second projector: five matches.
Explanation: the two and three matches stand for the numbers 2 and 3 respectively: there is a clear unambiguous relation between the tokens (the three matches) and their meaning, the mathematical object (the number 3). The moving of the 3 matches to the first projector stands for the addition operation: a repetition of adding one until there is no match left on the second projector. The equality of the 2 and the 3 as seperated units (representing the numbers 2 and 3) on the one hand and the whole of 5 matches (representing the number 5) is a mathematical equality.
You might say that I execute a conditional branching instruction when doing the demonstration: if there is a match on the second projector then take one match and put it on the first projector; else stop and read off the result.
It has the general format (pattern): IF <Condition> THEN <do A> ELSE <do B>.
For an interesting historical overview of this type of pattern in the evolution of knowledge see (Rens Bod 2019).
But notice that also my execution is conditional on the procedure that I follow. In the stored program computer this procedure is represented by a part of the machine storage. There is no difference in status between the program parts, the statements, and the numbers, the data operated on. The difference between statements or operators and numbers or operants is only in the minds of the designer and the programmer and in the way the parts of the machine state function.
I think most people did not took my demonstration as a serious answer to the question how a computer works. But I believe it shows an essential feature of the computer. A feature that Dennett misses when he tries to explain the power of the computer.
The function add for adding two natural numbers can be specified in a functional programming language by means of a simple recursive function as follows.
ADD A B = IF (B = 0) THEN A ELSE ADD (A+1) (B-1)
The function shows two essential features that every programming language must have: repetition and a conditional branching instruction. The repetition is realized by means of the recursion in the definition. The function calls itself, so it is repeatedly applied. Until some stop condition holds true.
According to Dennett the power of the register machine is in the conditional branching instruction. This construction tells the machine to check if a certain register contains the number 0 and then take a next step based on the outcome of this check. What is so special about this instruction?
“”As you can now see, Deb, Decrement-or-Branch, is the key to the power of the register machine. It is the only instruction that allows the computer to “notice” (sorta notice) anything in the world and use what it notices to guide its next step. And in fact, this conditional branching is the key to the power of all stored-program computers, (…)’’ (From: Intuition Pumps and other tools for thinking. The same text – without the bracketed sorta notice – can be found in Dennett’s lecture notes The secrets of computer power revealed , Fall 2008).
What Dennett misses, and what is quite essential, is that every instruction is a conditional instruction. Not just the Deb instruction. The End instruction, for example, only does what it means when the machine is brought in a world state that makes the machine execute this instruction. Eventually this is the effect of our act of instructing the machine. When we instruct the computer by pressing a key or a series of keys the computer “notices something in the world” and acts accordingly. For example by stopping when we press the stop button. This is precisely the feature I try to make clear by my first demonstration with the piece of paper. The set up demonstration (the piece of paper held in front of the mouth) is such that it “notices” the meaning of the word “move”. How do we know? Because of the way it responses to it. We see that the computer responds in correspondence to the meaning and goal of our command and we say that it “understands” what we mean.
Every instruction is conditional in the sense that it is only executed when it is actually given. Indeed, the machine does not know what it means to execute a command. A falling stone doesn’t know Newton’s law of mechanics. Does it? And yet, you might say that it computes the speed that it should have according to Newton’s laws when it touches the ground. Sort of.
However, Dennett is right in that the conditional instruction is special in the sense that it is the explicit form of the conditional working of the machine. But it assumes the implicit conditional working of the instructions we give to the computer. Just like the application of the formal rule of modus ponens assumes the implicit use of this rule. (See the References and Notes for how Lewis Carrol’s tries to make this clear with the story “What the tortoise said to Archilles”).
We call a logical circuit logical because the description of the relation between the values of the input and output of the circuit equal that of the formal logical rule seen as a mathematical operator.
The “world” that is noticed by the computer and whose value is tested in the branching instruction is in the end the input provide by the programmer by setting the initial state before he kicked off the machine to execute the instructions given.
Modern people don’t use computing machines like this. They use apps on their mobile phones or lap tops. They click on an icon shown on their user window to start an application and some interaction starts using text fields or buttons. When you ask them where the power of their computer comes from they probably would say from the provider of their popular application or maybe from the user-friendly functionality that the app offers them. Under the hood, hidden from the user, events or messages are send to specific parts, objects or agents of a virtual machine. These events trigger specific actions executed by the agents or objects that receives them.
Programmers don’t write programs for a register machine in machine code. They program in a higher order programming language like Java or Perl or some dedicated application language. Java is an object-oriented language that allows to program applications that are essentially event-based virtual machines.
The first computing machines were constructed to automate the arithmetic operations on whole numbers. Programmers were mathematicians that build and used programs to do numerical computations. Later, in the fifities, people like Yngve at MIT wanted to use the computer to automatically translate texts written in a natural language into a second natural language. The objects to be stored and manipulated are not numbers but words and sentences, strings, sequences of characters. They defined a string processing language so that linguists could use it in their scientific research. The very start of machine translation.
We distinguish a sentence from the act of someone expressing the sentence and meaning what it says. Somewhere in history of mankind this distinction was made. Now we can talk about sentences as grammatical constructs, objects that somehow exist abstract from a person that utters them in a concrete situation. Now we talk about “truth values” of sentences, we study “How to do things with words”; words and sentences have become instruments. Similarly, we analyse “conversational behaviors” (such “tiny behaviors” like head nods, eye gazes) as abstract gestures. And we synthesize gestures in “social robots” as simulations of “human conversational agents behavior”. Many people think that we can construct meaningfull things and events from meaningless building blocks if the constructs we built are complex enough. Complexity is indeed the only measure that rests for people that have a structural world view, a view that structure is basically all there is. (In Our Mathematical Universe: My Quest for the Ultimate Nature of Reality, Max Tegmark posits that reality, including life!, is a mathematical structure.)
Many people, including Dennett, think about the computer as something that is what it is abstract from the human mind, abstract from the user and the designer. As if the machine is what it is without the human mind for which it is what it is and does what it does. However, the real power of the computer is in the mind of the human who organises nature in such a way that it can be used as representation of meaningfull processes.
The Turing test does not test how intelligent a machine is. It tests if the human mind is already able to construct a machine that makes other humans believe that it is intelligent. This has consequences for the question who is ultimately responsible for what machines do. It has consequences for what we mean when we talk about “autonomous machines” or “artificial intelligence”.
Dennet sees the machine and the human mind as distinct realities that can exist seperately. For Dennett there is no fundamental difference between the computer that “sort of” understands and the human mind that “really” understands. The difference between the two is only gradual: they are different stages in an evolutionary proces.
Can robots become conscious? Dennett answers this question with a clear yes. In a conversation with David Chalmers about the question if superintelligence is possible Dennett posits:
“(…) yes, I think that conscious AI is possible because, after all, what are we? We’re conscious. We’re robots made of robots made of robots. We’re actual. In principle, you could make us out of other materials. Some of your best friends in the future could be robots. Possible in principle, absolutely no secret ingredients, but we’re not going to see it. We’re not going to see it for various reasons. One is, if you want a conscious agent, we’ve got plenty of them around and they’re quite wonderful, whereas the ones that we would make would be not so wonderful.” (For the whole conversation (recorded 04-10-2019): https://www.edge.org/conversation/david_chalmers-daniel_c_dennett-is-superintelligence-impossible)
Can machines think? Dennett would answer this question with a clear yes, too. After all: people are machines, aren’t we? But he doesn’t consider this question as really important. I think Dennett confuses our technical reconstruction of natural intelligent behavior (for example social robots understanding natural language) with the real thing (people having a conversation).
The real challenge of artificial intelligence is not in this type of “philosophical” questions.
According to Dennett the real challenge of AI is not a conceptual but a practical one.
“The issue of whether or not Watson can be properly said to think (or be conscious) is beside the point. If Watson turns out to be better than human experts at generating diagnoses from available data it will be morally obligatory to avail ourselves of its results. A doctor who defies it will be asking for a malpractice suit.”
The human expert will have to motivate what he did with the knowledge stored in the computer. The final responsibility for the treatment chosen must always remain with the human expert. The computer may have statistical knowledge based on big data, the human expert has to relate this to the case at hand.
“The real danger, then, is not machines that are more intelligent than we are usurping our role as captains of our destinies. The real danger is basically clueless machines being ceded authority far beyond their competence.” (D.C.Dennett in: The Singularity—an Urban Legend? 2015)
I cannot agree more with Dennett’s than with this. As soon as machines are considered autonomous authorities they stop being seen as usefull technical instruments. They are considered Gods, magical masters, then.
A.M. Turing, D.C. Dennett and many more intelligent minds are products of evolution. Machines are products of evolution as well. But there is a fundamental difference between natural intelligence as we recognize it in nature as a product of natural Darwinian evolution, and artificial intelligent machines that are invented by human intelligence.
As soon as we forget, for whatever reason or by whatever cause, this important difference will disappear.
References and Notes
Rens Bod (2019/2022). Een wereld vol patronen – de geschiedenis van kennis. Prometheus Amsterdam. Translated in English: World of Patterns: a global history of knowledge, 2022 Open access
Provides a historical overview of the human quest for patterns and principles in the world that surrounds us from pre-history to 1800.
Lewis Carroll, “What the Tortoise Said to Achilles,” Mind 4, No. 14 (April 1895): 278-280.
This is a story about the Hypothetical Proposition (HP): if A and B then Z
Tortoise: I accept A and B as true, but not the HP. Convince me I have to accept Z by logic. The Turtoise proposes to call the HP: (C): if A and B then Z.
“If A and B and C are true, Zmust he true,” the Tortoise thoughtfully repeated. “That’s another Hypothetical, isn’t it? And, if I failed to see its truth, I might accept A and B and C‘, and still not accept Z. mightn’t I?”
This amounts to: (D) If A and B and C are true, Z must be true.
And on the same reasoning the next Hypothetical Proposition is:
(E) If A and B and C and D are true, Z must be true.
Until I have granted that, the Tortoise claims, of course I needn’t grant Z.
“So it’s quite a necessary step, you see?”
And so on…ad infinitum.
What Lewis Carroll wants to make clear to the reader is that in applying the Hypothetical Proposition we use a rule implicitly. The Turtoise asks Achilles to write that implicit rule down, just as the HP. But this results in a new HP, that he wants to be treated on the same level as the one before.
Compare the sequence of utterances: “It rains”, ” It rains is true”, “It rains is true is true.” Every next one is making explicit what is implicit in the previous statement.
D.C. Dennett, Intuition Pumps and other tools for thinking, W.W. Norton Publ.,2013. Translated in Dutch: Gereedschapskist voor het denken. Uitg. Atlas Contact, Amsterdam/Antwerpen, 2013.
L.E.Fleischhacker, Beyond Structure: the power and limitations of mathematical thought in common sense, science and philosophy. European University Studies 20(449), Peter Lang, Frankfurt am Main, 1995.
Goldstine, Herman H. (1972). The Computer – from Pascal to von Neumann. Princeton University Press, 1972.
Interesting history of the computer showing the development of various calculating machines from the mechanical Pascaline to the programmed electronic computer written by an insider.
Hertz, Heinrich (1894). Die Prinzipien der Mechanik in neuen Zusammenhange dargestellt. Mit einen Vorworte von H. von Helmholtz.