Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Discover how to bring true open-source principles to foundation model development through a conference talk that addresses the critical gap between "open-weight" model releases and fully reproducible machine learning research. Learn about the Marin project from Stanford, which tackles the problem of incomplete open-weight releases that lack essential components like training code, data recipes, and training logs. Explore how this innovative approach ensures every training run begins as a GitHub pull request with explicitly stated hypotheses and fully pinned configurations. Understand how Ray orchestrates large-scale jobs across preemptible Google Cloud TPUs while streaming real-time metrics and storing artifacts tightly linked to the exact commit that launched each run. Examine how the system maintains complete transparency by making successes, failures, and restarts publicly visible, preserving the iterative scientific process rather than hiding it. Gain insights into leveraging Ray for transparent, inspectable, and reproducible large-scale training that brings foundation model development closer to established open-source software standards.
Syllabus
Marin: Open Development of Open Foundation Models | Ray Summit 2025
Taught by
Anyscale