Heimdall - A Modular Framework for Tokenization in Single-Cell Foundation Models
Valence Labs via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a comprehensive framework for evaluating tokenization strategies in single-cell foundation models through this research presentation. Learn about Heimdall, an open-source toolkit that systematically decomposes single-cell foundation models (scFMs) into modular components including gene identity encoders, expression encoders, and cell sentence constructors. Discover how different tokenization approaches impact model performance across challenging transfer learning scenarios such as cross-tissue, cross-species, and spatial gene-panel shifts. Examine the critical role of tokenization choices in distribution shift scenarios and understand how gene identity encoding and ordering strategies drive the largest performance gains. Access practical insights into combining existing tokenization strategies to enhance model generalization, while gaining familiarity with the standardized evaluation framework that enables reproducible exploration of single-cell RNA-sequencing data analysis for drug discovery applications.
Syllabus
Heimdall: A Modular Framework for Tokenization in Single-Cell Foundation Models | Ellie Haber
Taught by
Valence Labs