Get 20% off all career paths from fullstack to AI
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off your first 3 months — limited time.
Unlock All Certificates
Explore the intersection of computer vision and natural language processing in this comprehensive lecture that examines how computational systems can understand and connect visual content with textual descriptions. Learn about fundamental approaches to bridging the semantic gap between images and words, including methods for automatic image annotation, content-based image retrieval, and the challenges of creating systems that can meaningfully associate visual features with linguistic concepts. Discover techniques for extracting semantic information from both visual and textual data, understand the computational frameworks used to model relationships between different modalities, and examine case studies demonstrating practical applications in image understanding and description generation. Gain insights into the theoretical foundations and practical implementations of multimodal systems that process both visual and linguistic information simultaneously.
Syllabus
2004 07 14 David Forsyth Words and Pictures
Taught by
Center for Language & Speech Processing(CLSP), JHU