Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Inria (French Institute for Research in Computer Science and Automation)

Reproducible Research II: Practices and tools for managing computations and data

Inria (French Institute for Research in Computer Science and Automation) via France Université Numerique

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access

Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue exploring reproducibility with a focus on massive data and complex calculations. These two MOOCs complement each other and offer a coherent training program on the subject.

In this 2nd MOOC, you will learn how to manage large datasets and complex computations in controlled software environments, using formats such as JSON, FITS, and HDF5, platforms like Zenodo and Software Heritage, tools like git-annex, Docker, Singularity, Guix, make, and Snakemake. Keys concepts are introduced and applied through numerous hands-on exercises and a real-life use case on sunspot detection, demonstrating how to work in a reliable and reproducible way.

A new module for this session proposes exercises illustrating how the tools and techniques we teach are helpful in the daily practice of computational research. Interviews with experienced practitioners of reproducible researchalso discuss related tools, helping you decide whether you should invest in more elaborate tools or not, and which pitfalls you may stumble upon.

Syllabus

Preparing for the mountain hike to reproducibility
  • Astronomers interviews about sunspots detection
  • Getting started with JupyterLab and the sunspot time series
  • Sunspot Time Series: Exercises
  • Reproducibility and research software communities
Module Managing data
  • Archiving
  • File formats
  • Project Organization
  • Git Annex
Module Managing software
  • On the Importance of Software Environment
  • Package Management Principles
  • Isolation and Containers
  • Using Containers
  • Building and Sharing Containers
  • Functional Package Managers (Guix, Docker, Singularity...)
Module Managing computations
  • Why do we need workflows?
  • From notebooks to shell scripts
  • Workflows with make
  • Workflows with snakemake
  • Workflows and environments
Module Reproducibility in the large
  • Getting familiar with the Sunspot project
  • Checking the reproducibility of computations
  • Checking the robustness of the workflow to a variation on the software environment
  • Injecting new data
  • Investigating specific aspects of the data
  • Parameterizing our workflow to evaluate parameter sensitivity
  • Inverviews with experts

Taught by

Arnaud Legrand, Christophe Pouzat, Konrad Hinsen, Matthieu Simonin, Ludovic Courtès, and Kim Tâm HUYNH

Reviews

Start your review of Reproducible Research II: Practices and tools for managing computations and data

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.