Beyond the Prompt - Evaluating, Testing, and Securing LLM Applications

Learn how to evaluate, test, and secure Large Language Model (LLM) applications beyond basic prompt engineering in this comprehensive conference talk. Discover essential measurement techniques for assessing the impact of prompt changes and Retrieval-Augmented Generation (RAG) pipeline modifications in your LLM applications. Explore various evaluation frameworks including Vertex AI Evaluation, DeepEval, and Promptfoo to systematically assess LLM outputs and understand the different types of metrics they provide. Delve into critical security considerations for LLM applications, including protection against prompt injections and prevention of harmful responses. Examine testing and security frameworks such as LLM Guard to implement robust guardrails on both inputs and outputs that go beyond basic safety settings. Gain practical insights into building resilient LLM applications that are precisely limited to your intended use cases while maintaining safety and reliability standards.