Explore comprehensive methods for evaluating Generative AI application quality in this 37-minute conference talk by Databricks. Dive into the suite of tools including inference tables, Lakehouse Monitoring, and MLflow for rigorous evaluation and quality assurance of model responses. Learn to conduct offline evaluations and real-time monitoring, ensuring high-performance standards. Discover best practices for using LLMs as judges, integrating MLflow for experiment tracking, and leveraging inference tables and Lilac for enhanced model management. Optimize workflows and ensure robust, scalable GenAI applications aligned with production goals. Presented by Alkis Polyzotis and Michael Carbin, this talk offers valuable insights for developers and data scientists working with Generative AI technologies.