Automated A/B Testing with the Upper Confidence Bound

Explore statistical inference challenges in adaptive experimentation environments through this one-hour lecture delivered by Prof. Koulik Khamaru from Rutgers University. Discover how modern decision-making methods like A/B testing, multi-armed bandits, and reinforcement learning fundamentally challenge traditional statistical inference approaches, often resulting in biased estimators and misleading confidence intervals when classical i.i.d.-based tools are applied. Learn about the concept of stability, originally formulated by Lai and Wei (1982), as a unifying principle for achieving valid inference under adaptive data collection conditions. Examine how algorithms such as the Upper Confidence Bound (UCB) achieve stability, enabling the application of classical inferential tools despite the absence of independence in the data. Gain insights through concrete examples that highlight the pitfalls of naive inference, including key illustrations of the empirical mean in stochastic bandits and contextual bandit problems, both supported by central limit theorems. Understand the intersection of statistics, machine learning, and optimization through the lens of reinforcement learning algorithm design and hypothesis testing methods for RL data.