From Text to Vision to Voice - Exploring Multimodality with OpenAI
AI Engineer via YouTube
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Free courses from frontend to fullstack and AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the cutting-edge multimodal capabilities of OpenAI's latest technologies in this 24-minute conference talk delivered by Romain Huet, Head of Developer Relations at OpenAI, at the AI Engineer World's Fair in San Francisco. Discover the future of AI through comprehensive demonstrations of GPT-4o Omnimodel Voice, ChatGPT Desktop, Sora video generation, and Voice Engine technologies, all showcased in a single presentation. Learn how these revolutionary tools are transforming the landscape from traditional text-based interactions to sophisticated voice and vision capabilities. Gain insights into the practical applications and potential of multimodal AI systems that seamlessly integrate text, visual, and audio processing. Understand the technical innovations behind OpenAI's omnimodel approach and how these advancements are shaping the next generation of AI-powered applications and user experiences.
Syllabus
From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet
Taught by
AI Engineer