Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Vamba - Understanding Hour-Long Videos with Hybrid Mamba-Transformer Architecture

MLOps World: Machine Learning in Production via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about VAMBA, a hybrid Mamba-Transformer model designed to process hour-long videos efficiently in this 29-minute conference talk by Weiming Ren, a Ph.D. student at the University of Waterloo. Discover how traditional transformer-based large multimodal models (LMMs) face computational challenges with lengthy video inputs due to quadratic complexity in causal self-attention operations, resulting in high costs during training and inference. Explore how VAMBA addresses these limitations by incorporating Mamba-2 blocks to encode video tokens with linear complexity, enabling the processing of over 1024 frames (640×360) on a single GPU compared to just 256 frames for transformer-based models. Understand the significant performance improvements VAMBA delivers, including at least 50% reduction in GPU memory usage during training and inference, nearly double the speed per training step, and a 4.6% accuracy improvement on the challenging hour-long video understanding benchmark LVBench. Examine how this hybrid architecture maintains strong performance across both long and short video understanding tasks while avoiding the information loss typically associated with token compression methods, representing an orthogonal approach to handling extremely long video sequences in machine learning applications.

Syllabus

Vamba Understanding Hour Long Videos with Hybrid

Taught by

MLOps World: Machine Learning in Production

Reviews

Start your review of Vamba - Understanding Hour-Long Videos with Hybrid Mamba-Transformer Architecture

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.