LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

MIT HAN Lab via YouTube Direct link

MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

1

of 1

1 of 1

MLSys'25 - LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention