Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
AI Adoption - Drive Business Value and Organizational Impact
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 12-minute conference talk from MLSys 2025 presenting "LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention," delivered by researchers from MIT HAN Lab. Discover how this innovative approach addresses the challenges of serving large language models with long input sequences. Learn about the unified sparse attention mechanism that enhances efficiency in LLM deployment. The presentation, held on May 14th at Santa Clara Convention Center in California, features insights from authors Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, and Song Han. Access additional resources including the project website, research paper, and GitHub repository to further understand this cutting-edge work in machine learning systems.
Syllabus
MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Taught by
MIT HAN Lab