Learn how to implement reliability practices in AI-driven systems through this conference talk from SREcon25 EMEA, where Meta engineers Jay Lees and Javier Martin Montull share their expertise on managing continuous online A/B-test experiments. Explore the unique challenges of maintaining system reliability when hundreds of engineers are simultaneously iterating, tweaking, and tuning AI models that directly impact business outcomes. Discover practical strategies for instilling a reliability mindset in rapidly changing AI environments, and gain insights into proven mechanisms for preventing, detecting, and quickly mitigating issues triggered by AI experiments in large-scale production systems. Understand how to balance the need for rapid AI model experimentation with the critical requirement of maintaining system stability and reliability in enterprise environments.