Free courses from frontend to fullstack and AI
Learn Generative AI, Prompt Engineering, and LLMs for Free
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore a 12-minute conference talk from MLSys 2025 presenting "LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention," delivered by researchers from MIT HAN Lab. Discover how this innovative approach addresses the challenges of serving large language models with long input sequences. Learn about the unified sparse attention mechanism that enhances efficiency in LLM deployment. The presentation, held on May 14th at Santa Clara Convention Center in California, features insights from authors Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, and Song Han. Access additional resources including the project website, research paper, and GitHub repository to further understand this cutting-edge work in machine learning systems.
Syllabus
MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Taught by
MIT HAN Lab