Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Limits of Today's AI Models - Transformers, State Space Models, and the Future of Multimodal Intelligence

Y Combinator via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the fundamental limitations of current AI architectures in this 18-minute conference talk featuring Y Combinator's Ankit Gupta and Karan Goel, founder and CEO of Cartesia, recorded at NeurIPS 2025. Discover why transformers function more like retrieval systems than true learning systems and understand the potential of state space models to enable compression and abstraction in AI. Learn about the concept of intelligence as compression and examine the critical differences between retrieval-based and abstraction-based approaches to artificial intelligence. Investigate why multimodal intelligence may require entirely new architectural approaches and understand Cartesia's strategic decision to focus on AI voice as their wedge product. Delve into the technical aspects of tokens, representations, and learning signals while exploring how audio can serve as a foundation for other modalities. Gain insights into the challenges of building research-driven companies that must balance deep technical innovation with practical product development, and understand how product development can serve as validation for research hypotheses in the competitive startup environment.

Syllabus

— Introducing Cartesia
— From Architecture Research to Startup
— What “Architecture Research” Really Means
— Why Transformers Hit a Ceiling
— State Space Models Explained
— Intelligence as Compression
— Retrieval vs. Abstraction
— Hybrid Architectures and the Future
— Why Cartesia Chose Voice AI
— What Multimodality Actually Means
— Audio as a Recipe for Other Modalities
— Tokens, Representations, and Learning Signals
— Learning Representations End-to-End
— Building for the “Average Human”
— Research vs. Product Reality
— One Vision, Ruthlessly Executed
— Product as a Truth Serum for Research
— Startup Gravity Applies to Research Too

Taught by

Y Combinator: The Vault

Reviews

Start your review of The Limits of Today's AI Models - Transformers, State Space Models, and the Future of Multimodal Intelligence

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.