Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Stanford University

Merlin - A Vision Language Foundation Model for 3D Computed Tomography

Stanford University via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about groundbreaking research in medical imaging through a Stanford University lecture where PhD candidate Ashwin Kumar presents Merlin, an innovative 3D Vision Language Model designed for computed tomography interpretation. Discover how this resource-efficient AI model processes over 6 million CT images from 15,331 scans, along with extensive electronic health records and radiology reports, to perform various diagnostic and prognostic tasks. Explore the model's capabilities across six task types and 752 individual tasks, including zero-shot findings classification, phenotype classification, disease prediction, and 3D semantic segmentation. Understand how Merlin addresses the growing need for automated medical image interpretation amid radiologist shortages, while achieving impressive results using minimal computational resources - requiring only a single GPU for training compared to conventional models needing hundreds.

Syllabus

MedAI #134: Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Ashwin Kumar

Taught by

Stanford MedAI

Reviews

Start your review of Merlin - A Vision Language Foundation Model for 3D Computed Tomography

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.