How Latitude AI Trains Perception Models at Massive Scale

Learn how to scale multimodal perception model training using Ray's distributed computing framework through this 32-minute conference talk from Ray Summit 2025. Discover how Latitude AI engineers Richard Kwant, Marius Seritan, and Krishna Toshniwa leverage Ray to power their dataset generation, inference workloads, and complex training pipelines for autonomous vehicle perception systems. Explore their complete ML lifecycle implementation with Ray's distributed computing model, focusing particularly on their most demanding component: the training pipeline that processes images, point clouds, rich metadata, and multi-task targets while maintaining GPU saturation through extremely fast, parallel data loading. Follow their migration journey from traditional PyTorch DataLoader to Ray Data, uncovering the concurrency limits and throughput constraints they encountered along the way. Examine the specific optimizations and distributed patterns they implemented to overcome bottlenecks in data loading, memory management, serialization, and prefetching, ultimately achieving over a 3× increase in training throughput. Understand the challenges of implementing scalable sampling techniques and improving observability across Ray Data pipelines. Gain practical insights for designing high-performance multimodal training pipelines, debugging distributed data systems, and leveraging Ray to accelerate complex perception workloads from end to end in production autonomous vehicle environments.