MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Watch a 48-minute lecture exploring the convergence of Computer Vision and Natural Language Processing, focusing on groundbreaking developments in Vision-Language integration and Embodied AI. Discover how AI systems can generate image descriptions, respond to questions, and navigate environments using natural language instructions. Explore cutting-edge techniques for text generation from visual content, methods for human-controlled AI systems, and the training of large-scale models using web datasets. Learn about the application of these technologies to embodied agents performing navigation and physical world interactions. Gain insights into evaluation metrics and current challenges in the field, with specific emphasis on recent research developments in human-AI interaction paradigms.
Syllabus
From Images to Text New forms of Human-AI Interaction
Taught by
AI Doctoral Academy