Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

MIT HAN Lab via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 12-minute conference talk from MLSys 2025 presenting "LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention," delivered by researchers from MIT HAN Lab. Discover how this innovative approach addresses the challenges of serving large language models with long input sequences. Learn about the unified sparse attention mechanism that enhances efficiency in LLM deployment. The presentation, held on May 14th at Santa Clara Convention Center in California, features insights from authors Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, and Song Han. Access additional resources including the project website, research paper, and GitHub repository to further understand this cutting-edge work in machine learning systems.

Syllabus

MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Taught by

MIT HAN Lab

Reviews

Start your review of LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.