Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified
Launch a New Career with Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to implement reliability practices in AI-driven systems through this conference talk from SREcon25 EMEA, where Meta engineers Jay Lees and Javier Martin Montull share their expertise on managing continuous online A/B-test experiments. Explore the unique challenges of maintaining system reliability when hundreds of engineers are simultaneously iterating, tweaking, and tuning AI models that directly impact business outcomes. Discover practical strategies for instilling a reliability mindset in rapidly changing AI environments, and gain insights into proven mechanisms for preventing, detecting, and quickly mitigating issues triggered by AI experiments in large-scale production systems. Understand how to balance the need for rapid AI model experimentation with the critical requirement of maintaining system stability and reliability in enterprise environments.
Syllabus
SREcon25 Europe/Middle East/Africa - Experimenting with AI-Driven Systems
Taught by
USENIX