Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

From Text to Vision to Voice - Exploring Multimodality with OpenAI

AI Engineer via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the cutting-edge multimodal capabilities of OpenAI's latest technologies in this 24-minute conference talk delivered by Romain Huet, Head of Developer Relations at OpenAI, at the AI Engineer World's Fair in San Francisco. Discover the future of AI through comprehensive demonstrations of GPT-4o Omnimodel Voice, ChatGPT Desktop, Sora video generation, and Voice Engine technologies, all showcased in a single presentation. Learn how these revolutionary tools are transforming the landscape from traditional text-based interactions to sophisticated voice and vision capabilities. Gain insights into the practical applications and potential of multimodal AI systems that seamlessly integrate text, visual, and audio processing. Understand the technical innovations behind OpenAI's omnimodel approach and how these advancements are shaping the next generation of AI-powered applications and user experiences.

Syllabus

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

Taught by

AI Engineer

Reviews

Start your review of From Text to Vision to Voice - Exploring Multimodality with OpenAI

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.