Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue exploring reproducibility with a focus on massive data and complex calculations. These two MOOCs complement each other and offer a coherent training program on the subject.
In this 2nd MOOC, you will learn how to manage large datasets and complex computations in controlled software environments, using formats such as JSON, FITS, and HDF5, platforms like Zenodo and Software Heritage, tools like git-annex, Docker, Singularity, Guix, make, and Snakemake. Keys concepts are introduced and applied through numerous hands-on exercises and a real-life use case on sunspot detection, demonstrating how to work in a reliable and reproducible way.
A new module for this session proposes exercises illustrating how the tools and techniques we teach are helpful in the daily practice of computational research. Interviews with experienced practitioners of reproducible researchalso discuss related tools, helping you decide whether you should invest in more elaborate tools or not, and which pitfalls you may stumble upon.