Rethinking Scale in Single-Cell Foundation Models and Infrastructure Challenges in Single-Cell Omics

Explore the latest research in single-cell foundation models through this comprehensive seminar from the Broad Institute's Models, Inference and Algorithms series. Learn from Lorin Crawford of Microsoft Research New England as he challenges conventional wisdom about scaling in single-cell biology, presenting findings from training 400 models on 22.2 million cells across 6,400 experiments that reveal performance plateaus occur with much smaller datasets than expected. Discover how training data composition significantly impacts model performance, with specific insights into human hematopoiesis showing that deep generative models struggle with unseen cell types, malignant cell inclusion doesn't necessarily improve disease modeling, and embryonic stem cell data enhances out-of-distribution performance. Gain practical knowledge from Davide D'Ascenzo of Polytechnic University of Turin on overcoming infrastructure and modeling challenges in single-cell omics, including the development of scDataset for efficient data loading of hundreds of millions of cells and hierarchical cross-entropy loss methods that incorporate cell type ontology structures to improve annotation accuracy across different model architectures.