How do you explain reproducible research to clients?
Most of the statistics work I do now is reproducible research - this
can offer a big advantage for clients but of course that doesn’t
necessarily mean they realise it …
Below is a text we have been pasting in at the bottom of the source
documents (and which therefore appears in the pdf’s) to explain
reproducible research. Would be very interested if anyone has any
better ideas ….
This is a reproducible research document.
This approach has the following advantages:
- making it easier for us to return to the data and analyses in the future and repeat or extend them
- making it easier for the client to do the same without having to contact us
- enabling other researchers to repeat and verify these findings themselves, even automatically if they desire.
- ensuring complete transparency of the results.
Concretely, this means that the original SPSS and other data files
will not be changed at all. All recoding, data cleaning, omission of
cases etc is carried out in syntax. In fact this report document
itself — tables, graphics, statistics mentioned within the text are
produced entirely by the following procedure: A word processing
document (“source file”) is prepared which is essentially the final
report complete with introduction, chapter headings, commentary etc
together with blocks of syntax where statistical results are required
- in particular tables, and graphics and inline results. A single
syntax file is run which takes the source file and creates a second
document, the present report, which is identical to the source file
except that the blocks of syntax are replaced by the results of the
syntax (tables, graphics, etc.).
So there is neither any
cutting-and-pasting or editing of data in the data files and nor is
there, for example, any manual editing of table data or graphics. So
at each point in this report at which data preparation is discussed,
the interested reader will find the corresponding syntax at the
corresponding point in the source file which actually conducts the
corresponding data preparation. And at each point in this report at
which tables, graphics etc are displayed, the interested reader will
find the syntax at the corresponding point in the source file which
actually constructs those tables and graphics. So the source document
and datasets can be made available to third parties who can then
repeat these calculations, see exactly how they are arrived and, and
can extend the analyses at will.
Unfortunately, to the best of our knowledge the statistics program
most familiar to social scientists, SPSS, does not fulfill all of
these requirements, in particular it cannot produce a complete report
automatically. So the work will be carried out using the open-source statistics program R. But intermediate
datasets in SPSS format including all recoded and calculated variables
can be provided additionally, so that as much as possible of the above
can also be accomplished with SPSS.