Merlin - A Vision Language Foundation Model for 3D Computed Tomography
Stanford University via YouTube
Launch Your Cybersecurity Career in 6 Months
UC San Diego Product Management Certificate — AI-Powered PM Training
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about groundbreaking research in medical imaging through a Stanford University lecture where PhD candidate Ashwin Kumar presents Merlin, an innovative 3D Vision Language Model designed for computed tomography interpretation. Discover how this resource-efficient AI model processes over 6 million CT images from 15,331 scans, along with extensive electronic health records and radiology reports, to perform various diagnostic and prognostic tasks. Explore the model's capabilities across six task types and 752 individual tasks, including zero-shot findings classification, phenotype classification, disease prediction, and 3D semantic segmentation. Understand how Merlin addresses the growing need for automated medical image interpretation amid radiologist shortages, while achieving impressive results using minimal computational resources - requiring only a single GPU for training compared to conventional models needing hundreds.
Syllabus
MedAI #134: Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Ashwin Kumar
Taught by
Stanford MedAI