The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

Attend this research presentation examining the effectiveness of medical adaptations of large language models and vision-language models in healthcare applications. Learn about a comprehensive study comparing ten medical LLMs and two VLMs against their base models, revealing that medical adaptations fail to consistently improve performance on downstream medical tasks like question answering. Discover findings showing that medical LLMs outperform base models in only 26.7% of cases for clinical-note-based QA tasks, with many performing significantly worse than their general-purpose counterparts. Explore the methodology behind these conclusions, including direct model comparisons, optimized prompting strategies, and statistical uncertainty accounting. Gain insights into why state-of-the-art general-domain models may already possess strong medical knowledge and reasoning capabilities, challenging common assumptions about domain-specific pretraining benefits. Engage with recommendations for strengthening future medical AI research and participate in interactive discussion about the implications for healthcare AI development. The session features Daniel P. Jeong, a PhD student from Carnegie Mellon University's Machine Learning Department, whose research focuses on developing rigorous evaluation strategies for healthcare machine learning applications.