LLM Evaluation Framework for Crafting Delightful Content from Messy Inputs

Explore an evaluation framework for assessing the quality of Large Language Model (LLM) outputs in transforming diverse and messy textual inputs into refined content. This 32-minute conference talk by Shin Liang, Senior Machine Learning Engineer at Canva, delves into the challenges of objectively evaluating LLM outcomes in subjective and unstructured tasks. Learn about general evaluation metrics like relevance, fluency, and coherence, as well as specific metrics such as information preservation rate, accuracy of title/heading understanding, and key information extraction scores. Discover how this framework can be applied to similar LLM tasks, providing valuable insights for crafting high-quality content from complex inputs.