Crowd sourced evaluation: did it work?

This is a quick report of what happened when I tried to crowd-source” a recent global evaluation I did for a UN agency.

What is crowd sourced evaluation?

An ordinary evaluation report in its draft version gets sent around a bunch of stakeholders who make comments, perhaps using a tracked changes feature, and the evaluator has to respond to those comments, sometimes in a procedure which gets repeated two more rounds. Many of the inputs and comments might be defensive or off-topic, but there are always at least a few which bring you vital new information and interpretation which weren’t picked up during the evaluation process.

I always thought:

How come I have to do all the work of writing an almost complete report & then wait for this highly valuable expert input which comes almost too late? Couldn’t I put these experts to better use earlier on? If I crowdsourced this, perhaps it would not only speed up the review process but perhaps they could help write the thing for me in the first place, saving me a lot of phone-calls and leg-work …

So my idea for a crowd sourced evaluation is, essentially, to create an online version of the report, initially consisting of just the evaluation questions as sections and subsections, together with some initial evidence and ideas for answering them; then I send invitations to contribute relevant evidence, opinion and interpretation to help answer any of the questions. The invitation goes out by email to a much broader range of stakeholders than is usually possible, perhaps in waves, for example beginning and ending with invitations to the most important stakeholders.

I should add that for me, most of the headings of the evaluation report are simply the evaluation questions, and the text under them is the answers to those questions. The subheadings of the report are the sub-questions. Answering the main questions involves synthesising the answers to the sub-questions. So this format is reproduced in the online crowd-sourced evaluation”. Beneath the headings, contributors can add and debate the relevant evidence and interpretation. Contributors can also add their own questions if they want and also like” each others’ contributions. The process is a kind of web-based focus group, so contributors can also can see other people’s opinions on the programme and can contribute to a broader discussion than is possible in a 1-1 interview.

So my role as evaluator is to formulate the questions, invite and moderate the discussion and help present and debate new evidence as it comes in, and finally to synthesise and make judgements on the evidence and interpretations.

There is a fuller description of the motivation here and the actual steps are spelt out here.

So what about the technical implementation?

I looked at lots of online platforms, and road-tested some of them. I was hoping to be able to use something like WordPress which is very fully featured and would enable me to provide a lot of other features around the evaluation process, for example a calendar. I also wanted something which was free or almost free. However, I settled in the end on discourse, a discussion platform which is new but elegant and already well respected. Discourse is quite difficult to install yourself, but I used a service called discourse hosting which worked very well. Nevertheless, it took a lot of work to tweak everything to make the commenting process as easy as possible. The process of inviting the right people at the right time to make the right comments on the right questions also needs managing carefully.

Update: The sign-in process to a private Discourse forum when someone is invited using a specific email address was a bit tricky. So I have been in touch with Discourse about smoothing the process … they recognised the need and were very helpful, and now it is fixed! Open source is great.

More good news: there is now a plug-in for Discourse which allows administrators to get a live view of custom stats like number of people who like other people or total number of likes received by individual users. This is really useful if you want to offer a prize e.g. for the most useful contributions.

How did it work?

For this evaluation, I still spent plenty of timing doing old-fashioned interviews etc. The crowd-sourcing process was just one part of it all. Still, I was able to somewhat reduce the number of face-to-face or remote interviews for such a diverse programme; instead I invited many to this crowd-sourced” evaluation process. I rephrased about 20 key evaluation questions in a slightly provocative format, where appropriate adding some first answers already emerging from the evaluation, before inviting contributions.

The original idea was to involve primarily the client’s staff and senior partners in debate, but very few of this group responded actively in spite of several requests and in spite of the evaluation office making participation almost mandatory. (And in spite of me offering the prize of an ipad for the most liked” contributions.) These people just didn’t want to go on record even though the whole discussion was private. However, invites were also sent out to other groups of younger stakeholders, so that in total over 250 stakeholders were invited. 54 people accepted the invitation and registered, of whom 43 read more than one topic, with an average (excluding me) of around 20 topics read per user. Altogether there were over 30 topics contributed by 5 people including myself, and 99 replies from 21 people excluding the evaluator. Topics were read 1064 times. 24 topics had two or more replies.

There were one or two participants who were very critical and one or two who were very positive about the programme; most of the others were in between with perhaps a critical but supportive tone. The more moderate posts received the most likes.

As the client’s staff were mostly not very involved in the debate it was unfortunately not possible to use the material to assess any kind of consensus on the issues. However some important new issues and evidence were brought forward which I could easily integrate into the findings section of the final report.

On the one hand there was a real and controversial discussion which did indeed bring in some more evidence. On the other hand this discussion did not involve a majority of the people I had invited. As the whole discussion had to be made private, users could only log in with their work email addresses, and this also caused some problems. Some people took to the format quite naturally, others were scared off by it.

The discourse software worked very well and I will certainly be using it again.


I would certainly use this procedure again now that I have ironed out many of the kinks, especially with a large project with plenty of stakeholders scattered around the planet. But I would be most likely to use it where key respondents are likely to be open to taking part in a discussion and not too fearful of going on record”.

If anyone feels like trying it out for themselves, I am happy to give tips and support.

Update: one problem is that, the way I did it, you have to have access to a whole Discourse installation for just one discussion forum. Most evaluators aren’t going to want to be bothered with this. I am thinking of how to enable different crowd-sourced evaluations from the same Discourse instance - then I could offer this service to others more simply. Let me know if you might be interested in trying this out.

Up next Blot, a promising but almost invisible blogging platform Why have a blog? I now have a landing page which I find very handy to link to my online accounts like google scholar, linkedin etc. Still, there are Theory Maker and markdown Theory Maker is a free and simple web app for making diagrams of theories of change etc. I already wrote about it. Its special feature is that you
Latest posts Making notes on PDFs without computer or paper Publications causal-map Causal Map intro Causal Mapping - an earlier guide The walk to school in Sarajevo Glitches Draft blog post for AEA365 Theory Maker! Inventory & analysis of small conservation grants, C&W Africa - Powell & Mesbach! Lots of charts! Answering the “why” question: piecing together multiple pieces of causal information rbind.fill for 1-dimensional tables in r yED graph editor Examples of trivial graph format Using attr labels for ggplot An evaluation puzzle: “Talent show” An evaluation puzzle: “Mobile first” An evaluation puzzle: “Many hands” An evaluation puzzle: Loaves and fishes An evaluation puzzle: “Freak weather” An evaluation puzzle: “Billionaire” Using Dropbox for syncing Shiny app data on Amazon EC2 Progress on the Causal Map app Articles and presentations related to Causal Maps and Theorymaker Better ways to present country-level data on a world map: equal-area cartograms A starter kit for reproducible research with R A reproducible workflow for evaluation reports Welcome to the Wiggle Room Realtime comments on a Theory of Change Responses to open questions shown as tooltips in a chart A panel on visualising Theories of Change for EES 2018?