Evaluations in Agentic Workflows - Live Demo

Learn how to implement systematic evaluations for AI workflows and agents in n8n through this conference talk from the Advanced Track of n8n Builders Berlin. Discover why evaluations are crucial for making AI workflows more reliable and explore practical methods for handling inconsistent LLM outputs, context drift, and edge cases. Master the implementation of evaluations across three critical phases: during development, before deployment, and in production monitoring. Explore techniques for comparing different models and prompts using A/B-style comparisons, while tracking essential metrics including correctness, helpfulness, and token usage. Gain hands-on experience with evaluation triggers, data tables, and metrics within the n8n platform, and understand how to leverage LLM-as-a-judge methodology with reference answers. Watch a comprehensive live demonstration showing these concepts in action, followed by a Q&A session addressing practical implementation questions. Perfect for developers and automation specialists working with AI workflows or agents in n8n who want to establish clearer, more systematic approaches to testing and monitoring their AI-powered automations.

Syllabus

00:00 Intro
02:00 Why AI Evaluations Matter
04:50 Evaluation Methods
06:22 How to use evaluations in n8n
06:49 Pre-Deployment Checks
07:41 Monitoring in Production
11:20 Live Demo
21:05 Q&A