Adaptive Multi-armed Bandit Algorithms for Markovian and IID Rewards
Centre for Networked Intelligence, IISc via YouTube
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Most AI Pilots Fail to Scale. MIT Sloan Teaches You Why — and How to Fix It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a technical lecture on multi-armed bandit (MAB) algorithms that addresses both Markovian and independent and identically distributed (i.i.d.) reward scenarios. Delve into the challenges of obtaining regret guarantees for MAB problems where arm rewards form Markov chains outside single parameter exponential families. Learn about a groundbreaking algorithm that employs total variation distance-based testing to identify whether rewards are Markovian or i.i.d., enabling dynamic adaptation between standard and specialized Kullback-Leibler upper confidence bound (KL-UCB) approaches. Delivered by Prof. Arghyadip Roy from IIT Guwahati's Mehta Family School of Data Science and Artificial Intelligence, drawing from his extensive research experience in stochastic systems optimization, wireless network resource allocation, and reinforcement learning gained through his work at institutions including IIT Bombay, University of Illinois at Urbana-Champaign, and Jadavpur University.
Syllabus
Time: 5:00– PM
Taught by
Centre for Networked Intelligence, IISc