AI Product Expert Certification - Master Generative AI Skills
Learn Backend Development Part-Time, Online
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to effectively evaluate Large Language Model (LLM) performance in production environments through educational assessment principles in this 15-minute conference talk. Discover why traditional machine learning metrics like accuracy and error rates fall short when dealing with subjective text generation quality, and explore the additional complexities that arise when combining LLMs with other tools in agentic contexts. Understand how to adapt academic evaluation methodologies by creating clear, objective rubrics that define success criteria for LLM outputs. Master the technique of deploying additional tested LLMs to conduct systematic evaluations based on these rubrics, providing an efficient solution to the evaluation challenge. Examine the remaining gaps and limitations in current LLM evaluation approaches, gaining insights from both machine learning engineering and educational assessment perspectives. The presentation draws from real-world experience in deploying AI systems to production and applies lessons learned from academic teaching to solve practical machine learning evaluation problems.
Syllabus
Why Language Models Need a Lesson in Education
Taught by
MLOps.community