Reproducible Research STRATEGY
I will train my postdoctoral researchers, graduate students, technicians and undergraduate interns on the importance of transparency in data analysis, as well as good practices and technologies for data management, processing and reproducible research.
-
We will follow good data management practices:
-
Good record keeping practices (either paper or electronic lab notebooks), including keeping track of software and dataset versions used for a particular analysis.
-
Logical folder structures and file naming for data, analysis, and results.
-
Implement good annotation practices and use of descriptive README files.
-
Routine automated back up to a secure centralised data server.
-
Archive unmodified raw data and sample annotation. Archiving of historical data and analyses for the duration required by funding agencies.
-
Keep track of data provenance.
-
Ensure that automated data processing pipelines are tested for accuracy.
-
Generate appropriate quality assuarance metrics during data processing.
-
-
We will strive for good software engineering practices:
-
Modular and well-documented code.
-
Use of integration and unit testing, software profiling and other good software engineering practices.
-
Software implementations of methodology will be submitted to open source repositories such as CRAN, Bioconductor, PyPi or hosted on our group website, post-publication. (subject to University and Funding agency intellectual property policies).
-
Use of technologies like Docker, Singularity containers and pipelining tools to provide reproducible workflows.
-
-
All our research code and in-house developed software are under version control at the SS lab Github repository.
-
All analyses resulting in a publication will be archived in the SS lab Github repository as reproducibility documents (Jupyter and R notebooks using technologies such as KnitR and Markdown). These will be checked for accuracy by someone other than the author (usually me or someone nominated by me).
-
All relevant data-sets (and associated annotation) used in a publication will be uploaded to the appropriate public data-repository and made available post-publication (ex: SRA, GEO, EGA, ArrayExpress , BioModels, Wikipathway etc.). We will adopt minimal information standards (such as MIAME and MINSEQE) specified for different data types.
-
Any methodology publications will be uploaded to preprint servers such as bioRxiv at the same time as submission to a peer-reviewed journal. All publications will be open access as required by University of Cambridge policy.
-
We will maintain any research web resources or apps that we develop for at least 5 years after publication.
-
We and our collaborators will always do our best to carry out verification and validation of our analyses and predictions.
-
I will keep an up-to-date web presence.
Modified from Lorena Barba's Reproducibility PI Manifesto