Beyond the Prompt - Evaluating, Testing, and Securing LLM Applications

Explore comprehensive strategies for evaluating, testing, and securing Large Language Model applications in this 48-minute conference talk from NDC Oslo 2025. Learn how to measure the effectiveness of prompt changes and Retrieval-Augmented Generation (RAG) pipeline modifications through various evaluation frameworks including Vertex AI Evaluation, DeepEval, and Promptfoo. Discover the essential metrics these frameworks provide and understand their practical applications in assessing LLM outputs. Delve into critical security considerations for LLM applications, including protection against prompt injections and prevention of harmful responses through robust input and output guardrails that extend beyond basic safety settings. Examine testing and security frameworks such as LLM Guard to ensure your applications remain safe and operate within precisely defined parameters, providing you with the knowledge to build more reliable and secure LLM-powered solutions.