A Tight Lower Bound for Non-Stochastic Multi-Armed Bandits with Expert Advice

Learn about a groundbreaking theoretical result in machine learning through this 46-minute conference talk that presents a tight lower bound for the non-stochastic multi-armed bandits problem with expert advice. Explore the classic problem introduced by Auer, Cesa-Bianchi, Freund and Schapire (2002), where a learner engages in a repeated game with an adversary across T rounds, utilizing N experts and K arms. Understand how in each round, experts recommend arms to the learner who selects one to pull, while the adversary assigns losses and the learner observes only the chosen arm's loss, with the goal of minimizing regret compared to the best expert in hindsight. Discover the recently established tight lower bound on minimax optimal expected regret that resolves a long-standing open question in the field. Examine how this result, combined with Kale's 2014 upper bound, proves that the minimax optimal expected regret is of order (T K log(N/K))^0.5. Gain insights from joint research conducted with Zachary Chase and Shinji Ito, presented by postdoctoral researcher Idan Mehalel from the NSF-Simons Collaboration on the Theoretical Foundations of Deep Learning.