BanditSpec - Adaptive Speculative Decoding via Bandit Algorithms

Attend this academic seminar exploring BanditSpec, an innovative training-free online learning framework that adaptively optimizes speculative decoding configurations for Large Language Models using bandit algorithms. Learn how this approach formulates hyperparameter selection as a Multi-Armed Bandit problem to accelerate LLM inference while maintaining text generation quality. Discover the UCBSpec and EXP3Spec algorithms designed for both stochastic and adversarial reward settings, with theoretical analysis demonstrating optimal regret performance. Examine extensive empirical experiments with LLaMA3 and Qwen2 models that validate the framework's effectiveness in real-life LLM serving scenarios with diverse input prompts. Gain insights into information-theoretic impossibility results and stopping time regret bounds that establish the theoretical foundations of this adaptive speculative decoding method. The presentation by Professor Vincent Y. F. Tan from the National University of Singapore covers the mathematical formulation, algorithmic design, theoretical guarantees, and practical implementation of this novel approach to optimizing LLM inference throughput without requiring additional training or offline model alignment.