Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

JENGA - Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about JENGA, a novel fine-tuning system that addresses memory constraints in large language model long-context applications through contextual token sparsity in this 15-minute conference presentation. Discover how researchers from Tsinghua University and Microsoft Research tackle the critical limitation of high activation memory footprints that arise when extending LLM context windows for long-context applications. Explore the concept of Contextual Token Sparsity, a new token-level sparsity mechanism that minimizes redundant token involvement while preserving model accuracy. Understand three key techniques implemented in JENGA: Token Elimination for dynamically identifying and excluding redundant tokens across varying inputs and layers, Pattern Prediction using well-trained predictors to approximate token sparsity patterns with minimal overhead, and Kernel Optimization employing permutation-free and segment-based strategies to enhance system performance. Examine comprehensive evaluation results demonstrating JENGA's ability to reduce memory consumption by up to 1.93× and achieve up to 1.36× speedups compared to state-of-the-art fine-tuning systems, while maintaining compatibility with various LLM architectures and other optimization techniques.

Syllabus

USENIX ATC '25 - JENGA: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity

Taught by

USENIX

Reviews

Start your review of JENGA - Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.