Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot

Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot

USENIX via YouTube Direct link

FAST '25 - Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture...

1 of 1

1 of 1

FAST '25 - Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture...

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot

Automatically move to the next video in the Classroom when playback concludes

  1. 1 FAST '25 - Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture...

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.