Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Energy-Based Transformers are Scalable Learners and Thinkers - Paper Review

Yannic Kilcher via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a comprehensive paper review examining Energy-Based Transformers (EBTs), a novel approach to developing models that can perform System 2 Thinking through unsupervised learning alone. Discover how researchers have developed a new class of Energy-Based Models that assign energy values to input-prediction pairs, enabling predictions through gradient descent-based energy minimization. Learn about the key advantages of EBTs over traditional Transformer++ approaches, including up to 35% higher scaling rates across data, batch size, parameters, FLOPs, and depth during training. Understand how EBTs demonstrate superior performance improvements of 29% more than Transformer++ on language tasks during inference-time computation, while also outperforming Diffusion Transformers on image denoising tasks with fewer forward passes. Examine the methodology behind training these models to explicitly verify compatibility between inputs and candidate predictions, and how this reframes prediction problems as optimization challenges. Analyze the research findings showing that EBTs work across both discrete text and continuous visual modalities, making them modality-agnostic unlike many existing approaches. Investigate how these models achieve better generalization on downstream tasks compared to existing approaches, even when starting with similar or worse pretraining performance. Gain insights into the potential of EBTs as a promising new paradigm for scaling both learning and thinking capabilities in machine learning models, addressing limitations of current inference-time computation techniques that are often modality-specific, problem-specific, or require additional supervision beyond unsupervised pretraining.

Syllabus

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Taught by

Yannic Kilcher

Reviews

Start your review of Energy-Based Transformers are Scalable Learners and Thinkers - Paper Review

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.