Scaling GenAI Inference From Prototype to Production - Real-World Lessons in Speed and Cost

Explore real-world strategies for scaling GenAI inference systems from prototype to production in this lightning talk by Anish Kumar, Lead Engineer at Scribd, Inc. Discover how to overcome cost and time constraints when deploying AI systems at scale using Databricks' fully managed infrastructure. Learn to leverage four essential Databricks features—Workflows, Model Serving, Serverless Compute, and Notebooks—to build robust AI inference pipelines capable of processing millions of documents including text and audiobooks. Master the design of modular, parameterized notebooks that enable concurrent execution, effective dependency management, and accelerated AI-driven insights. Understand how to facilitate seamless collaboration between Data Scientists and Engineers through rapid experimentation capabilities, easy GenAI prompt tuning, flexible compute settings, efficient data iteration, and comprehensive quality testing frameworks. Gain actionable strategies for optimizing AI inference performance, automating complex data workflows, and architecting next-generation serverless AI systems while maintaining cost efficiency and maximizing operational performance.