Chapter 5 Inference with Causal Maps

5.1 Introduction

How should we encode -– capture, write down, picture -– causal information?

Or:

How should we convert causal information -– whether gathered from narratives or experiments or experts -– into a causal map?

The reverse question:

How should we interpret a causal map?

If we can do the one, we can do the other.

As evaluators we live and breathe causal information.

Causal information is just as basic as, say, numerical information about the level of a variable.²

We learned in school how to write down and manipulate numerical information. But not causal information. We need to agree, and teach one another, how to write down and manipulate causal information too.

For the moment, we are not at all interested in which particular medium -– for example, sentences as opposed to diagrams -– we want to use to encode causal information, because we want to understand the rules which should govern the use of the symbols in any such encoding. (Spoiler: in fact we will be using diagrams containing arrows between boxes, basically, for this task.)

A social researcher or an evaluator might be set the task of encoding the causal information in some written or spoken information, say from an interview, and for that they will need symbols or conventions which make the causal information clearer.

Of course, people speak and write sentences which contain causal information all the time. But it isn’t easy to see exactly what information is contained in such sentences, because in natural language we have a very messy and heterogeneous patchwork of ways of talking about causation: dozens of half-complete systems for encoding different variants of causal information, some of them suffused with ideas like blame and responsibility. Just as language contains overlapping, only partially consistent and mostly incomplete, ways of talking about time, and about space, and about quantity. Mathematicians and philosophers have nevertheless managed to systematise them in ways which are mostly satisfactory.

Causal language is not one system. But it is possible to identify and describe a single formalised system, which is a good approximation to the ones we find naturally occurring in language. That is the one I will present here. As far as possible, we will try also to identify symbols and conventions which fuzzily capture the fuzziness, incompleteness, ambiguity etc of the way people talk about causal relationships and of the supposed causal relationships themselves.

Using the causal mapping app we are particularly interested in how we write down causal information from multiple sources. But that is a pretty specialised task: first we need to look at the basics: how do we write down causal information at all? What exactly is the meaning of a causal arrow?

5.2 Understanding the causal arrow

Causal arrows are not conduits for a mysterious force caused “causation”. What flows down the arrows, so to speak, are actual forces - magnetism, peer pressure, persuasion, whatever. The arrows are the conduits for these actual forces: the form, not the content.

We will not get anywhere if we try to understand the meaning of a causal arrow, or how to use it, by using synonyms or pointing at some process.

Instead we will learn its “meaning” in precisely the way we learn the “meaning” of arithmetical symbols like + for addition: by learning the rules by which we may draw inferences between sentences containing such symbols.³

Here we could review existing bodies of thought on symbolic treatment of causal relationships:

Judea Pearl (Pearl 2000) and co - mostly statistics-based
Philosophy of causation, Mackie (Mackie 1974), Lewis (Lewis 2000), etc -– mostly dealing only with binary variables

5.3 First steps

Evaluators encounter different kinds of causal maps:

Theories of Change for a project or programme, even (in a very restricted sense) Logical Frameworks
(Perhaps not quite the same thing:) Programme theories in theory-based evaluation
The kind of maps produced by QuIP
Fuzzy Cognitive Maps
Systems Diagrams
Structural Equation Models
INUS diagrams
(arguably) diagrams used in Realist Evaluation
(sometimes) diagrams used in Outcome Harvesting

A causal map is more than boxes connected by arrows. The boxes refer to things which could be different, which could be one way or the other, I like to call them “variables”. The arrows show that the boxes somehow influence one another, though it is not always clear how. We’d like to perhaps be able to use different symbols so we can specify these links more precisely, and add a corresponding legend to the map. But what symbols do we need? Some kinds of causal map already have their own sets of symbols and conventions, like SEMs. Do all the different kinds of causal map, like those in the list, all need to have their own legends? Can some be shared? The problem of what symbols to use is certainly a problem for the first three.

If we have to convert fragments of causal information expressed in other ways, like narrative testimonies from stakeholders about some domain of interest, into causal maps -– a task we will refer to as “coding” -– we need to know how to do that, what symbols and conventions we need to use to encode as much of the relevant causal information as we can.

So we have a challenge to establish conventions for constructing causal maps which makes their meaning transparent.

I claim that this problem about meaning can be solved by agreeing on how to reason with causal maps, to do causal inference. The causal map app needs to have the right rules for causal inference in order to be able to bring the maps alive -– so that we can ask them questions like “does variable B have more influence on E than variable C?” or give them instructions like “Hide all the variables which only have a small influence on E.”

When we learn the meaning of + in school, words don’t help us much, we learn by making deductions; we learn that from

x = 2 + 3

we can deduce, for example,

x = 5

according to certain rules. We could call them “inference rules”.

Here we aren’t dealing with equations, we are dealing with causal maps. We need inference rules for causal maps.

5.4 So what are the inference rules for causal maps?

If we can agree on the inference rules, all the rest -– what symbols to use, how to combine and aggregate maps, how to translate them back into English, how to do calculations with them -– will follow like a charm.

5.4.1 The basic rule

A mini-map in which one or more variables (the “influence variables”) are shown with arrows leading to another (“the consequence variable”) says, for example: “the influence variables B, C and D all have some kind of causal influence on the consequence variable E”. That sentence is equivalent to the map below.

This is a “mini-map”: one or more variables are shown with arrows leading to another.

You could say, ‘aha but the sentence contains the words “causal influence” so you have explained one mystery with another. If you take that part out, the diagram could be about anything, and the arrows might mean “is larger than” or “is a child of”, or lots of things’. That’s true, but the whole point is that we will specify not just this inference rule for causal maps, but enough different ones that only causality is left as a possible interpretation of their meaning. Or, to put it differently, if we have a child who can use + perfectly, we aren’t bothered if they can explain it in words or not; if you understand the inference rules, you understand enough. If you want to test whether someone understands how to code QuIP information correctly, you can ask them to make various inferences with the maps.

Again: we are not going to say what “causal influence” means. We are going to show how it works.

Some of these rules will seem pretty trivial and obvious. They should. But they are necessary for building up a complete and consistent system. In any case I need them to make an app which actually works, and we need to spell them out so we can agree how to code with the app.

5.4.2 Combination rule

The next rule says that you can combine mini-maps into larger maps and vice-versa. So from this:

and, say, this

you can deduce this (and vice-versa):

5.4.2.1 Interpretation

It seems so trivial it is hard to put into words. In short: you can combine two causal maps into one. If you know that these pink things influence those green things, and you know that these red things influence these blue things, then you know that these pink things influence those green things and that these red things influence these blue things⁴.

5.4.2.2 Note on context

This rule only works as stated providing both mini-maps are from the same context. All maps actually have a context attached to them in which they are claimed, stated, posited, uttered, etc. I am not going to deal much with context at this point, i.e. I will assume that all the maps come from the same unspecified context, but it is really important. In general, the combined map is only true in the intersection of the contexts. (So if one map is true for females, and the other is true only for young people, the combined map is true only for girls).

5.4.3 Chaining (transitivity)

The next rule says that you can chain mini-maps with common variables. So from this:

you can deduce this (and vice-versa):

5.4.3.1 Interpretation

This is a much bigger deal. It say that if O, given some combination of other variables and in a particular context, influences B, and B influences E in the same way, then O has some influence on E too.

5.4.3.2 Caveat: conflicts

What happens if the two (or more) maps we are combining result in a contradiction? We will deal with that later.

5.4.3.3 Caveat: loops

There might be some caveats about not creating loops which we won’t go into here.

5.4.3.4 An example of an inference rule for the causal statement

What’s quite weird about this rule is that there are two versions of it which are in a sense opposite extremes. One is, as above: “the influence variables all have some kind of causal influence on the consequence variable” and the other is “the influence variables all have total causal control of the consequence variable”. So the first is satisfied for B if there are some, any, values of C and D such that some difference on B makes a difference to E (it might be that most of the time, tweaking B does nothing to E), and similarly for C and D. Whereas in the second version, there is no room for E to do anything not dictated by B, C and D. It’s intriguing that a lot of the other inference rules are oblivious to this distinction. Anyway, we’ll stick with version 1, as above.

A completely controls B

and

B completely controls C

we can conclude that

A completely controls C

Similarly, if

A has at least some causal influence on B (i.e. there is at least once context in which it might make make a difference to B)

and

B has at least some causal influence on C

we can conclude that

A has at least some causal influence on C

These are (arguably?) two of the rules of causal inference.

So for example if you see this:

and you know that the arrows are supposed to mean “completely determines”, then you know that A completely determines C as well as B, because you are following the rules of inference for the causal arrows.

If someone disagrees with you, they must have some different understanding of what these arrows mean.

If a second person then shows you this, can you say “yes, I knew that”?

5.4.4 Conceptual links

The number of unvaccinated children is a conceptual combination of the number of unvaccinated girls and the number of vaccinated boys, but it isn’t caused by them.

The maths of causal maps which include conceptual connections too is really neat. So for example if my project increases the number of girls who are vaccinated, and we also know that total child mortality is partially caused by the number of unvaccinated children (showing conceptual connections as dotted arrows):

..… then the causal influence travels along the lines in the direction of the arrows, oblivious to the fact that some of them are dotted, and we can infer that my project likely helped to suppress child mortality. Nevertheless the dotted lines are not causal connections.

5.4.5 The Extras Rule: adding extra information, in particular about the levels or values of the variables

From this:

and, the English sentence asserted in the same context

“Variable B = 22 “

We can conclude this (and vice versa):

The same goes for any other kind of information in English (or any other natural language, of course) which tells us something about the state, or value, or level of one or more variables like or “Variable B is high” or “the level of Variable B is 22” or “Variable B is on” or even “Person B’s emotion is angry rather than sad”.

5.4.5.1 Interpretation

It just says, you can move information about the levels of the variables formulated in English sentences into similar statements on the variables themselves within the map, and back again.

5.4.6 Corollary: Ordinary reasoning

From the rules above it follows that you do not need an extra rule to do ordinary reasoning with causal maps. So from the above map you can also deduce the map below. (To check, remember that we could use the Extras Rule to covert the map into the map without the extra information plus the English sentence “B = 22” and from this we can use ordinary reasoning to get the English sentence “B < 30” which we can then combine with the map without the extra information to give the version below.)

5.4.7 Metalanguage

We will also need some meta-language to talk about causal maps. Metalanguage is not part of the maps, it is a special part of English which we use to talk about the maps. We’ve already used some liberally above, for example the terms “influence variable” and “consequence variable”.

5.4.7.1 “The cause”, “a cause”

I prefer not to use phrases like “the cause” or even “a cause”. For one thing, these are too mixed up with our human concerns like blame, and legal responsibility, and moral judgements. For another, these words are too specialised for binary, true/false variables. Also, they are too monolithic. For example, C is certainly not “the cause” of E above.

5.4.8 Types of variable

We will need to distinguish different types of variable. Here are a few key types:

◪ continuous, limited variables like percentage, usually specified as going from zero (“nothing”) to 1 (“everything”). I call them “lo-hi variables” but I would love to hear of a better name. Can also be expressed as a %, because this is more familiar to people. We do not necessarily strictly understand this number between 0 and 1 as a proportion. So we can say, “confidence in the President is around 20% whereas for the former President it was around 50%” or “the project was performing at about half of its potential” even if we don’t make it clear what the numbers mean exactly. The point is our respondents will use such language which we need to reproduce, and also that when coding and then combining causal fragments, we sometimes have to commit to expressing a non-numerical claim with a rough number.
◨ false/true variables like “the project is implemented (yes or no)”
◢ continuous, unlimited variables like height, income
..… and various others, see xx.

When actually coding causal maps, we will mostly use ◪ variables, as ◨ variables are a special case of them, with just the levels 0 (no) and 1 (yes).

5.5 Causal thinking is essentially contrastive thinking

Variables contrast the actual (or imagined) state of things with possible alternatives.

all causal claims intrinsically have counterfactual meaning. (I should probably have said “contrastive” rather than “counterfactual” as the latter would more strictly only refer to past things which can no longer be changed, but this doesn’t matter here). If A claims that contrails causally influence the weather, and in particular that if if there are a lot of contrails, the weather will be worse, but has no opinion at all about what the weather would be like (controlling for any other influences) if there were fewer contrails, in particular doesn’t claim it would be any better, then they haven’t understood what “causal influence” means. In other words, I am suggesting that a useful and comprehensive general framework for causal networks in social science can start by treating the elements as variables in a broad (and mathematical rather than statistical) sense, i.e. as things that can / could be / could have been different.

The former cannot be reduced to the latter. There was an extraordinary idea that as causation cannot be reduced to correlation, only correlation -– numerical data -– is real whereas the causal information is somehow off the charts. This mistake has been pretty roundly exposed by Judea Pearl Pearl (2000) and the other leaders of the “causal revolution” in statistics.↩
If we were doing formal logic, we could say that we want to establish the axioms and theorems for sentences (or diagrams) involving the causal arrow.↩
though this motivation via knowing about stuff is a cheap sell and not strictly true -– it is a psychological claim and really we are not talking about psychology↩