Contextual + Ray - Boosting SFT, RL and Inference at Scale

Learn how to build enterprise-grade AI agents and applications using Ray as the backbone for scalable training, reinforcement learning, and low-latency serving across multi-node clusters in this 23-minute conference talk from Ray Summit 2025. Discover Contextual AI's end-to-end architecture platform designed to accelerate supervised fine-tuning (SFT), reinforcement learning (RL), and large-scale inference for real-world agentic workloads, with optimization for flexibility and performance to enable rapid iteration on complex agent behaviors. Explore key architectural components including asynchronous RL pipelines and large-scale multi-turn training, LoRA-based adaptation for fast specialization, context/data/tensor parallelism for efficient scaling, autoscaling and cold-start mitigation strategies, latency-aware routing for real-time agent serving, and disaggregated prefill and decode techniques for improved throughput under dynamic traffic patterns. Examine the operational aspects of running enterprise AI agents at scale, covering distributed observability through logging, metrics, tracing, and alerting, multi-host deployment patterns for reliability and redundancy, and techniques for maintaining system resilience, consistency, and service quality in production environments. Gain comprehensive insights into leveraging Ray for the complete lifecycle of enterprise AI agents, from large-scale training pipelines to mission-critical, low-latency production serving systems.