Section 51 Probability density functions

I just realised that it’s easy to solve the problem of paradoxical amounts of certainty about the strength of causal links.

The problem was this:

To calculate the strength of a link when combining information from different sources about its strength, we’d usually take some kind of mean or average. Here are two cases when people tell us, with an ordinary amount of certainty, about the strength of a causal link between B and E:

  • ten people say it is 0, and another ten tell us it is 1, the average would be .5.
  • twenty people tell us it is .5, so the average is also .5

It would be pretty bad if one can’t distinguish between these two cases. In the first case, we’d want there to be a lot of uncertainty (almost no useful information) when modelling the influence of B on E.

The standard way to deal with this is to store information about link strengths as probability density functions rather than single numbers, and it turns out this is really easy to implement.

So what difference does this make to the rules? When doing things like calculating the aggregated strength of the link and/or calculating how influences flow down the causal chain, the app doesn’t use individual numbers like .5 but actually probability density functions (PDFs): uncertainty clouds. So the app can still report the averages (which would of course be the same as before) but can also display information about the cloud of certainty / uncertainty if asked. More importantly, if there is a lot of uncertainty about a link, there is a lot of uncertainty about the way causal impact flows down it.

Goertz & Copestake (xx) suggest including “certainty” of a link in the coding form itself, … but the “paradox” above is more about the certainty we have in aggregated links rather than about coding uncertainty directly.

But sure, if the strength of aggregated links can have uncertainty, directly coded links can have them too. Two ways this could be implemented:

  • Just include a slider or text box for reported “certainty”, so the coder could just code say .5 for the strength, as well as a certainty of say .8, which would usually create a small cloud of other possibilities on each side of .5 (the coder wouldn’t need to see anything happening, but the information would be recorded)
  • Or, (I think this is better for most cases) there doesn’t need to be any change to the coding interface; everything just gets coded with a medium amount of uncertainty.

Actually, allowing PDFs for strength necessitates allowing PDFs for the values of variables too. This doesn’t necessarily imply any further difference to the coding form / inference rules / app, but it is available if required. I’ve already allowed for the possibility of coding the actual values of variables, so that when coding something like “The quality of the training course is important for the level of farmer skills, but the quality of the course was quite low”, it is possible to code the information about the quality as well as the causal link. Now with PDFs it would be possible to also code the certainty of that information if required (though I doubt this is necessary), and more interestingly it would be possible to report the uncertainty of that information, as well as the mean, when aggregated across sources.