Introduction

Yesterday I attended a fascinating workshop by Julian King and OPM on their approach to Value for Money (VfM). My first impressions are, this is great! It sounds to me like a mainstream and easy-to-follow way to do theory-based evaluation. The central question for mainstream, theory-based evaluation should be, “can we trace a thick causal thread from our intervention to positive impact on things we care about?” VfM adds simply an additional twist: “and, how good does that impact look when we compare it with how much we invest (money, time etc.) in that intervention”. Apart from that twist, the rest of VfM evaluation is, I believe, the same as good theory-based evaluation. So I think most of the content of this King/OPM approach is not specific to VfM evaluation.

DFID’s approach to VfM

The King/OPM approach to VfM is based on DfID’s: the four E’s - Economy, Efficiency, Effectiveness, and Equity, which was given a bigger role after the initial document (DfID 2011) - I’m not sure exactly when.

https://beamexchange.org/guidance/monitoring-overview/assessing-value-money/4e-approach-vfm/

If we were God, we would only need cost-effectiveness (the big black arrow at the top of the diagram above).

God says:

Let me just look down at the project: aha, these are the outcomes, the difference made by the project to everything that ultimately mattered to the project. Because I am God, I can trace the actual causal links from beginning to end, and yes, I can see that this intervention makes quite a big difference to these things you value over here. Well done! I have evaluated your project, and I saw it was good.

If we’d like to put that likely impact into relation with the resources and investments, time and money put in, we can call this “cost effectiveness”, or perhaps “value for money” or even “value for investment” as a better term to show we are making sure to include non-monetary things. To relate the resources to the outcomes, we can try converting them all into dollar values, if that’s what floats our boat, and then subtract the resources from the outcomes, or divide the outcomes by the resources, or whatever. But we can just leave the resources and outcomes in these two piles if we want, without trying to numerically combine them, because we can still do “soft arithmetic” with these two piles – for example, to compare this project with another which had similar outcomes but bigger resources, and to conclude that this project is better than the other.

King / OPM’s approach to VfM

(This is my own, abbreviated, understanding of the approach!)

As we are not God, we can’t be sure about the causal paths all the way through a complicated causal network like a real-life project, in order to see how the resources we put in lead to outcomes, and to put the former in relation to the latter – that is the the big black arrow at the top of the diagram above. We might have some overall evidence for the strength of this link, we might even have an RCT or a quasi-experiment giving evidence about the whole path, but whatever we have, we don’t have certainty. So we can look for complementary, supporting evidence for this path, by breaking that long causal path down into smaller pieces. We construct our Theory of Change and we ask, are the main links in the theory big and fat? Is each one the sort of link which regularly and reliably converts inputs at one end into outputs at the other? If we can trace a trail of big, fat links from resources to outcomes, then switching on or adding to the resources should make a corresponding difference appear on our outcome variables – most times, anyway.

If you want, if the links between resources and outputs seem to be strong, you can say the project is “efficient”. If the links between outputs and outcomes are strong, you can say the project is “effective”. Similarly, we can think of Economy as a link between resources and inputs or activities. If we don’t get enough for our resources, if the inputs are too expensive, the project is likely to be less cost-effective than another. “Economy”, “Efficiency” and “Effectiveness” are the links labelled in green in the diagram.

But you don’t need to be so specific, or have a Theory of Change which is so neatly organised into these old-fashioned layers; even if you have a hairy-scary all-over-the-place Theory, you just need to check the strength of each link on the way from resources to outcomes. If you tweak one end of the link, does it reliably produce a big effect on the other (and is the effect it produces the right kind to cause the right kinds of change further down the diagram)? (I’m not sure the King/OPM approach benefits from sticking to this rigid and contentious language of Outputs, Outcomes etc. – but I guess DfID does, and the “efficiency” “effectiveness” terminology is part and parcel of the same divisions, so perhaps it is too much work to change it.)

It is also useful, actually even more useful, if we don’t just have evidence about individual links but about longer chains – evidence on any part of the path from resources to outcomes is welcome.

Not being God, we can guess that if the project is efficient and effective, and economical, or more generally, if we have evidence that much of the causal path from resources to outcomes is thick and reliable, the project will be good value for money. The links are strong all the way down: this project seems to be an engine which is great at converting resources into value. In the absence of certainty, we can make a score for each relevant link (say, using agreed rubrics (King et al. 2013), from “poor” to “excellent”). We can also give a score to any direct evidence we might have (from an RCT, or from research on similar projects, or whatever) for overall cost-effectiveness or value for money. This is a key feature of the King / OPM approach: we don’t just store the evidence about each link for later analysis. We already rate it, evaluate it, as we store it, for example on a scale from “poor” to “excellent” using rubrics.

Then we can say that if the relevant links all get a good score, the overall pathway is thick and strong, and so the project is probably good value for money. But it’s worth reflecting that we only need the evidence from the individual links because we aren’t God and we can’t directly see the value for money (or “cost-effectiveness”) directly.

When looking for evidence for the strength of the links, what do we look for? These individual links are the causal parts of the causal network which we have tried to depict in our Theory of Change. So we don’t have a magic elevator here, to skip up Pearl’s ladder (Pearl and Mackenzie 2018) from non-causal evidence to causal conclusions. You could argue, we never see causal links directly, we can only see correlations, but of course, as Scriven and Pearl and others have explained, this is rubbish. The problem is that we can see causal links, but we have a philosophical framework, and understanding of causation, which tells us we shouldn’t be able to. So, ditch the framework. We observe causation, and more or less reliable markers of causation, in the same way (and as part of the way) in which we observe colour, and guilt, and beauty, and atmospheres in rooms, and good or poor project managers. Our perception is drenched through with causation like it is drenched through with colour. There is so much causation in our perception that the problem is not how to see it, but how to free ourselves from all the accompanying biasses and illusions (Kahneman 2011). We can use evaluation methods like we use binoculars, to enhance our causal vision, not to turn the blind into the sighted.

My invoking an all-knowing God earlier on was not because God can see causal relationships and we can’t. The reason we are never going to know the causal truth about real-life projects is because they are so complicated (and complex too). If we break the problem down into smaller units, maybe we stand a better chance – we can be a bit more sure about them.

So, what does count as evidence? All the usual stuff that evaluators look for. For example, impartial humans saying “yes, I’ve seen this part actually working” or “I’ve seen this kind of thing working in similar situations”. Comparing questionnaire scores on some component before and after. Comparing results with comparable contexts. Watching project staff actually listening to their partners and acting on their suggestions. And so on.

It’s also worth noting that a good project will also include mechanisms for sensing and responding to feedback coming back from “the field”: for changing the plan, if necessary. If we see these kinds of mechanisms, we are more likely to judge that this is the kind of project which will succeed, because it is less likely to get derailed because of unforeseen circumstances and less likely to miss new opportunities which appear. (And a good Theory of Change will sketch out these mechanisms.)

I think that is the Julian King /OPM (King 2016, 2018) approach to VfM in a nutshell: make a decent Theory of Change, verify and rate the individual links, synthesise, and then relativise the findings in relation to resources (the last part is what makes it into VfM). It’s a good approach.

Three points of praise / criticism

1) The methods described are not specific to VfM

Nothing about checking the causal links from Inputs / Activities onwards is specific to Value for Money. It’s just (theory-based) evaluation. You don’t need special, or new, evidence or data. You turn your ordinary evaluation findings (about how well the project produced value) into value-for-money findings by putting the former into relation with resources used, to conclude how well the project produced value, in relation to resources. That’s it. Using rubrics to capture and store the evidence is an interesting idea, but it is not specific to VfM: most of the rubrics about most of the links are not specifically about value for money (or investments).

2) The methods described have some advantages

The approach described by King / OPM is actually better in two ways than Mayne’s Contribution Analysis (Mayne 2012, 2015) and related approaches. The first advantage is that it isn’t concerned so much with what actually happened. Mayne starts from an actual outcome and asks whether, or how much, our intervention potentially contributed to it. But if we just look at the strength of links in a Theory of Change, we don’t even need to know if an outcome in this sense actually occurred. We can look at the strength of (some of) the links prospectively too, in just the same way, before the project has even begun. To be sure, focusing on what actually happens / happened is perhaps the best way to get evidence concerning the individual links (and the whole engine) which we can ever have, but it is not incontrovertible and it is not the only evidence which evaluators use. It is also, famously, vulnerable to the “good luck” causal illusion (when a bad project is judged as “successful”, because a piece of good luck flips the outcome) and the “bad luck” causal illusion (when a good project just failed to be judged “successful”, because a piece of bad luck flips the outcome the wrong way, for example at the last moment). It is also pretty useless when evaluating, say, a project which aims to help people prepare for a rare and unlikely event like an earthquake. VfM as described by King / OPM is (arguably) less distracted by “what actually happens”. It tells you more about how a project very like this one is likely to function, in a multitude of possible worlds, regardless of good or bad luck, so it is probably more useful for a decision about replicating / curtailing a project model.

Watching how a project officer listens to, or doesn’t listen to, stakeholders can provide part of the evidence for the strength of one particular link regardless of what happens down the causal stream (though we’ll surely watch that too, if we can).

The second advantage is that simply rating the strength of the links (e.g. from “poor” to “excellent”) provides a very simple but practical way to store and potentially combine this evidence. Sure, we can find more fancy ways to do this, we can even go Bayesian (Befani and Stedman-Bryce 2017) if we want, but sometimes simple (and transparent) is good. King and OPM are meticulous in being evaluative – being explicit about merit or worth – in every link within the causal network. That isn’t always necessary – you can still reason evaluatively in the final stage of summarising evidence which is not so expressed - but it has certain advantages.

3) Equity is not an additional criterion. If we value it, and we should, it should be inside the Theory of Change from the start.

I’m bothered by equity as a fourth “E”. I’ve argued that if we were God, we’d only need Cost-Effectiveness (expanded to non-monetary items), but that when we don’t have perfect evidence for this, we can combine any evidence we do have for it with evidence for Economy, Efficiency and Effectiveness. Equity is of course incredibly important and it is great that DfID focus on it. But if it matters, it’s a valued outcome and therefore must already be inside the Theory of Change.

In the diagram below (I’m presenting on some similar ideas at EES next week), resources are used for a simple intervention which vaccinates many people. One way to aggregate the number of people vaccinated is to calculate the total, and this is something we care about (more is better – we mark it with a heart symbol). Another way to aggregate it (using data on individual gender too) is to check (whether / how much) women and girls are benefiting, and we can use this data to make a simple score for equity (more is better). If we want, we can combine these scores into one global outcome rating, perhaps using an algorithm with some kind of weighting; or we can rely on evaluative judgement to combine them. The important point is, both total vaccinations and equitable vaccinations are perfectly good outcome variables (each defined in this case as different kinds of aggregations of individual-level data).

If we value equity, and/or carbon impact, or anything else, then these valued outcomes must always be part of the Theory of Change too. When we as evaluators or as planners or managers draw up a Theory of Change we model the causal links in our project but we must also model what is valued within it, and how parts of it are combined to make composite expressions of value. Project staff have hopefully been sharing and using an accurate Theory of Change during the project lifetime, and have been using it to steer the project towards all the outcomes that matter, for example not just total vaccinations but also equity in vaccinations. When equity criteria are included in the Theory of Change, you can ask and answer questions about cost effectiveness in achieving equitable vaccination (as well as in achieving total vaccination) – a question you can’t even formulate if you conceive of equity in DfiD’s way. You can’t just pop in at the end of the project and decide to judge it according to additional criteria, equity or carbon footprint or anything else. So equity doesn’t belong as a special feature of a VfM framework, however important it is. It should already be in the Theory of Change.

Exactly the same applies to sustainability. If we value sustainability – for example, that a project keeps producing value over a period of time, then we need to include that accumulating variable, which collects and sums up value produced over time, in our Theory of Change too, just as we did with Equity above. We have to, because project staff need to steer their interventions in a way which will maximise it. That’s what the Theory of Change is for. Then we can also ask and answer important questions like “how cost-effective was our project in achieving sustainable improvement in vaccination rate” which you can’t even properly formulate if you insist on keeping sustainability out of the Theory of Change.

References

Befani, Barbara, and Gavin Stedman-Bryce. 2017. “Process Tracing and Bayesian Updating for impact evaluation.” Evaluation 23 (1): 42–60. https://doi.org/10.1177/1356389016654584.

DfID. 2011. “DFID’s Approach to Value for Money (VfM),” no. July: 1–15. papers2://publication/uuid/973E23D3-EB83-4C59-8E9C-68C8243964C5.

Kahneman, Daniel. 2011. Thinking, fast and slow. New York: Farrar, Straus; Giroux.

King, Julian. 2016. “Using Economic Methods Evaluatively.” American Journal of Evaluation, 1–13. https://doi.org/10.1177/1098214016641211.

———. 2018. “OPM ’ s approach to assessing Value for Money,” no. January.

King, Julian, Kate McKegg, Judy Oakden, and Nan Wehipeihana. 2013. “Rubrics: A method for surfacing values and improving the credibility of evaluation.” Journal of Multidisciplinary Evaluation 9 (21): 11–20. http://journals.sfu.ca/jmde/index.php/jmde_1/article/viewFile/374/373.

Mayne, John. 2012. “Contribution analysis: Coming of age?” Evaluation 18 (3): 270–80. https://doi.org/10.1177/1356389012451663.

———. 2015. “Useful Theory of Change Models.” Canadian Journal of Program Evaluation 30 (2).

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Hachette UK.

Value for Money evaluation: first impressions

Steve Powell

29/9/2018