Merlin - A Vision Language Foundation Model for 3D Computed Tomography
Stanford University via YouTube
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Free courses from frontend to fullstack and AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about groundbreaking research in medical imaging through a Stanford University lecture where PhD candidate Ashwin Kumar presents Merlin, an innovative 3D Vision Language Model designed for computed tomography interpretation. Discover how this resource-efficient AI model processes over 6 million CT images from 15,331 scans, along with extensive electronic health records and radiology reports, to perform various diagnostic and prognostic tasks. Explore the model's capabilities across six task types and 752 individual tasks, including zero-shot findings classification, phenotype classification, disease prediction, and 3D semantic segmentation. Understand how Merlin addresses the growing need for automated medical image interpretation amid radiologist shortages, while achieving impressive results using minimal computational resources - requiring only a single GPU for training compared to conventional models needing hundreds.
Syllabus
MedAI #134: Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Ashwin Kumar
Taught by
Stanford MedAI