Merlin - A Vision Language Foundation Model for 3D Computed Tomography
Stanford University via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about groundbreaking research in medical imaging through a Stanford University lecture where PhD candidate Ashwin Kumar presents Merlin, an innovative 3D Vision Language Model designed for computed tomography interpretation. Discover how this resource-efficient AI model processes over 6 million CT images from 15,331 scans, along with extensive electronic health records and radiology reports, to perform various diagnostic and prognostic tasks. Explore the model's capabilities across six task types and 752 individual tasks, including zero-shot findings classification, phenotype classification, disease prediction, and 3D semantic segmentation. Understand how Merlin addresses the growing need for automated medical image interpretation amid radiologist shortages, while achieving impressive results using minimal computational resources - requiring only a single GPU for training compared to conventional models needing hundreds.
Syllabus
MedAI #134: Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Ashwin Kumar
Taught by
Stanford MedAI