January 12, 2017

Beaufort and Rubrics

A quick post about the Beaufort scale for wind speed, see below, as paradigm of a rubric.

Rubrics are really important in evaluation.

The Beaufort Scale is just a beautiful example of clearly defined, easily observable criteria. I love the way different features like smoke (lower wind speeds) become important at different wind speeds. At force 6, umbrella use becomes difficult - umbrellas are not mentioned elsewhere but it gives you a great idea of what force 6 feels like and so helps to anchor the scale.

In particular, I love the way the individual statements are mostly relatively objective, or rather, inter-subjective, i.e. they are likely to be understood by different people from different backgrounds in a similar way. In the social realm it is often hard to be so objective and so rubrics in project evaluation often include formulations like effective”, reasonably good overall” or just about adequate” which are again not inter-subjective and so in a way beg the question.

This must have been a big improvement for the Navy before the advent of mechanical windspeed measurement.

Another fascinating thing is that it manages to break a quantity like wind speed into no less than 12 different levels. In evaluation, we mostly see rubrics limited to 4-7 levels.

Wikipedia says the Beaufort scale was extended in 1946, when forces 13 to 17 were added, but they are only used in typhoon countries.

I always thought it would be hard to really distinguish 10, 11 and 12 though. And on the other hand, numbers are still needed beyond 12 especially in the tropics.

I wonder if that is because the scale is linearly anchored to windspeed. I would have thought that perception of windspeed, like most other things like light and sound, would be logarithmic so that there should be more frequent divisions at the lower ends and bigger jumps at the higher extremes. If the number/windspeed ratio was logarithmic, 10 11 and 12 would cover ever increasing spreads of windspeed and would neatly extend to cover just about any hurricane.

An example of where assuming a linear relationship with a physical quantity has unfortunate consequences. The developers of the scale would probably have been better advised to ignore the physical windspeed and concentrate on what can be inter-subjectively distinguished. It is not even obvious that the span of each rubric in terms of physical windspeed, or its logarithm, or indeed in terms of anything else, need to be equal at all - even subjectively. That all depends on what the rubric-based rating is going to be used for.

Beaufort number Description Sea conditions Land conditions
0 Calm Sea like a mirror Calm. Smoke rises vertically.
1 Light air Ripples with the appearance of scales are formed, but without foam crests Smoke drift indicates wind direction. Leaves and wind vanes are stationary.
2 Light breeze Small wavelets, still short but more pronounced; crests have a glassy appearance and do not break Wind felt on exposed skin. Leaves rustle. Wind vanes begin to move.
3 Gentle breeze Large wavelets. Crests begin to break; scattered whitecaps Leaves and small twigs constantly moving, light flags extended.
4 Moderate breeze Small waves with breaking crests. Fairly frequent whitecaps. Dust and loose paper raised. Small branches begin to move.
5 Fresh breeze Moderate waves of some length. Many whitecaps. Small amounts of spray. Branches of a moderate size move. Small trees in leaf begin to sway.
6 Strong breeze Long waves begin to form. White foam crests are very frequent. Some airborne spray is present. Large branches in motion. Whistling heard in overhead wires. Umbrella use becomes difficult. Empty plastic bins tip over.
7 High wind, Sea heaps up. Some foam from breaking waves is blown into streaks along wind direction. Moderate amounts of airborne spray. Whole trees in motion. Effort needed to walk against the wind.
8 moderate gale, Moderately high waves with breaking crests forming spindrift. Well-marked streaks of foam are blown along wind direction. Considerable airborne spray. Some twigs broken from trees. Cars veer on road. Progress on foot is seriously impeded.
9 near gale High waves whose crests sometimes roll over. Dense foam is blown along wind direction. Large amounts of airborne spray may begin to reduce visibility. Some branches break off trees, and some small trees blow over. Construction/temporary signs and barricades blow over.
10 Gale, Very high waves with overhanging crests. Large patches of foam from wave crests give the sea a white appearance. Considerable tumbling of waves with heavy impact. Large amounts of airborne spray reduce visibility. Trees are broken off or uprooted, structural damage likely.
11 fresh gale Exceptionally high waves. Very large patches of foam, driven before the wind, cover much of the sea surface. Very large amounts of airborne spray severely reduce visibility. Widespread vegetation and structural damage likely.
12 Strong/severe gale Huge waves. Sea is completely white with foam and spray. Air is filled with driving spray, greatly reducing visibility. Severe widespread damage to vegetation and structures. Debris and unsecured objects are hurled about.

Source: Wikipedia

evaluation research outcome-mapping social research
June 1, 2016

Judea Pearl

I only just came across the work of Judea Pearl (which shows how ignorant I am because he won the Turing Prize in 2011). I think his work is sensational and is essential reading for all scientists but in particular for social scientists and evaluators too.

Basically he says that science has suffered because statistics has failed to formally deal with causation, leaving it as a kind of mythical thing we only talk about in whispers. Pearl provides a robust and practical notation for causation with the do() operator and develops a complete set of theorems around it. In particular he shows under what conditions correlational data can indeed be used to draw causal conclusions.

Wait, he does what?

One of the most frustrating paradoxes in the whole of philosophy, and perhaps problem number one in the philosophy of science, is Hume’s depressing observation that we can’t actually observe causation. So how are we to do science? How are we to actually learn anything?

Pearl drops a bombshell. He says that we have just been lazy in assuming this impossibility. He shows how observational, correlational data can under certain circumstances provide evidence for causal statements. This is a very big deal.

AI systems can’t sit about getting frustrated by Hume’s paradox any more than a living, learning human can. Pearl and colleagues needed to know how to set up an AI system so that it can sift through observational data and indeed make causal hypotheses on the basis of some of it. What characteristics does observational data need to have in order to support causal hypotheses?

One massively important consequence of Pearl’s approach is that the randomised control trial loses its alleged status as the unique and golden road to causal evidence.

But Pearl had plenty of other things to say which should make social scientists, and evaluators in particular, sit up and listen. Against a background of a lot of rather airy discussions of chaos and complexity in the evaluation community, he points out that our knowledge of the way the world works is built up of surprisingly simple yet surprisingly stable chunks.

He isn’t making this up: he is one of the parents of modern AI. Intelligent systems right now are all about how to learn to work out new rules in new situations. Pearl’s algorithms are helping AI systems to do just that. We humans do it all the time. Both humans and AI systems understand the world at least partly in terms of relatively simple rules of thumb - mini-theories.

These mini-theories can by the way be seen as grist to the mill of realistic [@pawson_realistic_1997] evaluation theory. Perhaps Pearl also has some ideas for the problem which is always facing evaluators (and social scientists in general): how to synthesise the kind of mini-theories from which theories of change are built; and more generally, how to synthesise qualitative information.

Plus he does it all with structural equation models which are fun to look at and easy to work with, (and which can be seen as the basis for the logframes and logic models which evaluators have to use every day).

Look at the second part of this annotated bibliography to see the sort of things he has been dealing with. , e.g. Pearl J. and E. Bareinboim, Transportability of causal and statistical relations: A formal approach,” Proceedings, AAAI-11, 2011. Reduces the classical problem of external validity to mathematical transformations in the do-calculus, and establishes conditions under which experimental results can be generalized to new environments in which only passive observation can be conducted.

Anyway, the book might seem really hard (I am working through it but very slowly) but I just discovered there is an Epilogue right at the back of the book which provides a great summary. You can read it in an hour or two and it will definitely change your life.

evaluation research
May 18, 2016

Everything should be evidence-based - if only the evidence would make up its mind already

This question came up on an Evaluation mailing list and was forwarded to non other than Andrew Gelman, and for our purposes it can be boiled down to: what do we really know about financial motivation in organisations? Do we know enough to be able to say something like in this case use this reward system, it will bring you optimal results”?

Reading through the illustrious answers it seems clear that there are dozens of different theories and variations of theories, each with some research evidence, but nowhere anything like an evidence-backed consensus.

Policy, project and programme design, …. everything should be evidence-based - if only the evidence would make up its mind already.

Motivation and reward are comparatively easy to conceptualise and research. If we can’t get consensus here, how are we going to get consensus with hundreds of harder real-world problems like how much should cash programmes after a natural disaster target women? - single women? how much? family size? conditionality?”

How can we construct Theories of Change if there is such a lack of consensus about what leads to what?

twitter evaluation Theories of Change social research



This blog by Steve Powell is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, syndicated on r-bloggers and powered by Blot.
Privacy Policy
.