How do you explain reproducible research to clients?

Most of the statistics work I do now is reproducible research - this
can offer a big advantage for clients but of course that doesn’t
necessarily mean they realise it …

Below is a text I have been working on to explain
reproducible research. Would be very interested if anyone has any
better ideas ….

Short version

Reproducible research: wherever appropriate, all tables, graphics and statistical analyses in my reports are produced from a source file” which contains the usual headings, text etc., as well as machine-readable code for producing each table and graphic. Then each time I save a new version of the report, a statistics program called R” loads the raw data, cleans it, does calculations, replaces the code with the corresponding tables and graphics, and produces a Word, PDF or web document. That means consistency in presentation, up-to-date results, less chance of error, and complete transparency because it is easy to independently check the results.

Long version

Wherever a report includes quantitative data, I produce it using a reproducible” workflow: graphs, tables, and even calculated data such as totals in the middle of text passages are produced automatically (according to instructions in a source file” using a statistics program called R”) each time I save a new version of the report.

Advantages for the client:

  • Elimination of errors because there is no manual cutting-and-pasting or editing of data in the original data files, or manual editing of tables or graphics: the original data files (e.g. responses to questionnaires) are not touched at all but are loaded from scratch each time.
  • Faster workflow because I can produce preliminary and pre-final report versions even before all data collection is completed.
  • Better presentation because it is easy to make global changes to the way data is presented (e.g. to change the font size on dozens of graphics with one click).
  • Freshness of external data because the latest versions of any data from other sources, e.g. World Bank data, will be loaded live from online repositories to ensure freshness”.
  • Transparency & verifiable accuracy the client or others can, if desired, use the source file” to repeat these calculations, see exactly how they are arrived at, and verify for errors.
  • Openness and extensibility because the client can extend the analyses in later months or years — or hand them over to others if desired — without having to start from scratch.

Things for the client to bear in mind:

  • My usual workflow is to produce preliminary versions of the report(s) online at my own domain so you can follow progress. This will update automatically so you will always see the latest version. I can produce Word and PDF documents too, but if you want to make preliminary comments it is better to use email / Slack / Skype etc rather than doing it inside those documents.
  • I will usually produce the final draft in Word format. Up to that point, if you say, e.g. I want all the table headings up to page 22 in a larger font” or I want to exclude the data from before 2011 from the whole report”, I can make such changes very easily. After that, it is no longer so easy to make global changes to the appearance and content.
Up next Links for my AEA eval2017 presentation, Washington DC AEA program listing. A short post here on this blog The book is not an easy read But there is an Epilogue right at the back of the book which Peer mentoring for evaluators Peer mentoring – It is good practice for professionals who often work alone to be able to call on a mentor. I am beginning a co-mentoring
Latest posts Making notes on PDFs without computer or paper Publications causal-map Causal Map intro Causal Mapping - an earlier guide The walk to school in Sarajevo Glitches Draft blog post for AEA365 Theory Maker! Inventory & analysis of small conservation grants, C&W Africa - Powell & Mesbach! Lots of charts! Answering the “why” question: piecing together multiple pieces of causal information rbind.fill for 1-dimensional tables in r yED graph editor Examples of trivial graph format Using attr labels for ggplot An evaluation puzzle: “Talent show” An evaluation puzzle: “Mobile first” An evaluation puzzle: “Many hands” An evaluation puzzle: Loaves and fishes An evaluation puzzle: “Freak weather” An evaluation puzzle: “Billionaire” Using Dropbox for syncing Shiny app data on Amazon EC2 Progress on the Causal Map app Articles and presentations related to Causal Maps and Theorymaker Better ways to present country-level data on a world map: equal-area cartograms A starter kit for reproducible research with R A reproducible workflow for evaluation reports Welcome to the Wiggle Room Realtime comments on a Theory of Change Responses to open questions shown as tooltips in a chart A panel on visualising Theories of Change for EES 2018?