Section 34 Lo/hi Variables, types of variable, and contrasts

My strong suggestion will be that we can make a lot of sense of causal maps if we think of the nodes/factors/vertexes/tags as variables taking values between 0 and 1 (or maybe -1 and 1). I call variables of this kind “lo/hi” variables for the want of a better word.

Variables of this type are common in some parts of social science, e.g Kosko (1986).

Standardising our causal maps to use these kinds of variables in most cases will make most tasks a lot easier, in particular the hard task of specifying a suitable “soft arithmetic” for combining and simplifying hosts of causal fragments.

We can say, “confidence in the President is around 20% whereas for the former President it was around 50%” or “the project was performing at about half of its potential” even if we don’t make it clear what the numbers mean exactly. This method is pretty good for expressing a lot of the variables we encounter, so wellbeing of .9 would be pretty amazing, 0 is impossibly bad.

This value can have quite general meaning:

the value within an empirical range e.g. “this summer is as hot as we have ever had” (this summer temperature ≈ .95)
the value without reference to an empirical range e.g. “the teachers’ skills were appalling” (teachers’ skills ≈ .1)
the proportion of a set of things which have a binary property e.g. “most of the children seem happy” (happiness of the children ≈ .75)
the strength of the membership of something in a certain set e.g. “country X is only partly democratic” (country X democracy level ≈ .5)
the probability of a binary variable e.g. “the chances of war are now very low” (chance of war ≈ .1)
the strength of our information about a binary variable

The point is our respondents will anyway use such language which we need to reproduce, and also that when coding and then combining causal fragments, we sometimes have to commit to expressing a non-numerical claim with a rough number.

These ranges might also be tied to some kind of empirical distribution, so “Income from farming = .95” would mean it is in the middle of the top 10%, etc.

The terrible danger of these kinds of numbers is that if we use them, someone will accuse us of trying to achieve an unjustified level of “accuracy”. Whereas we don’t in fact want accuracy. But we don’t want to lose the kind of broad-brush information about small, large, largest, negligible, which our respondents are actually telling us. My suggestion is to hardly ever show the numbers and use them only for our “internal” calculations. Occasionally it might be useful to use semi-standardised phrases like “very high” / “high” etc, and/or empirically-grounded phrases like “in the bottom 10%”.

Another advantage of lo/hi numbers is that not only the current value of variables but also in some sense the strength of arrows can be captured with them (though we might like to encode the strength of arrows in a broader range, from -1, rather than 0, to 1).

We can directly interpret lo/hi information about the effect of a causal package on a consequence variable as an analog to effect size: 1 means total control, 0 means none. When we “zoom out” to ask about direct effects of a variable or package on variables further downstream of it, this effect size information is particularly interesting. It is the key to answering a core demand on our app and on reasoning with causal maps; to be able to say that, for example, variable B is a much more important influence on outcome O than variable C.

We can already start to guess how this kind of soft arithmetic will work – for example, if E is already at 90% and C has a 100%, positive, influence on E, and then we set C to 90%, we are at least sure E isn’t going to decrease and we’d expect it increase a bit. Whereas if the influence of C on E is strongly negative, say -90%, then if we set C to be large rather than small, we’d probably expect E to drop.

The details of how this arithmetic might work are up for grabs. But I will try to sketch out some solutions in the next sections.

It is relatively simple to extend this idea to the range -1 to 1, i.e. including the idea of variables with negative levels.

We already encode arrows as having negative influences.