What you'll learn:
- Understand the purpose of Testing LLM and LLM based Application
- Understand DeepEval and RAGAs in detail from complete ground up
- Understand different metrics and evaluations to evaluate LLMs and LLM based app using DeepEval and RAGAs
- Understand the advanced concepts of DeepEval and RAGAs
- Testing RAG based application using DeepEval and RAGAs
- Testing AI Agents using DeepEval to understand how tool callings can be tested
Testing AI & LLM App with DeepEval, RAGAs & more using Ollama and Local Large Language Models (LLMs)
Master the essential skills for testing and evaluating AI applications, particularly Large Language Models (LLMs). This hands-on course equips QA, AIQA, Developers, data scientists, and AI practitioners with cutting-edge techniques to assess AI performance, identify biases, and ensure robust application development.
Topics Covered:
Section 1: Foundations of AI Application Testing (Introduction to LLM testing, AI application types, evaluation metrics, LLM evaluation libraries).
Section 2: Local LLM Deployment with Ollama (Local LLM deployment, AI models, running LLMs locally, Ollama implementation, GUI/CLI, setting up Ollama as API).
Section 3: Environment Setup (Jupyter Notebook for tests, setting up Confident AI).
Section 4: DeepEval Basics (Traditional LLM testing, first DeepEval code for AnswerRelevance, Context Precision, evaluating in Confident AI, testing with local LLM, understanding LLMTestCases and Goldens).
Section 5: Advanced LLM Evaluation (LangChain for LLMs, evaluating Answer Relevancy, Context Precision, bias detection, custom criteria with GEval, advanced bias testing).
Section 6: RAG Testing with DeepEval (Introduction to RAG, understanding RAG apps, demo, creating GEval for RAG, testing for conciseness & completeness).
Section 7: Advanced RAG Testing with DeepEval (Creating multiple test data, Goldens in Confident AI, actual output and retrieval context, LLMTestCases from dataset, running evaluation for RAG).
Section 8: Testing AI Agents and Tool Callings (Understanding AI Agents, working with agents, testing agents with and without actual systems, testing with multiple datasets).
Section 9: Evaluating LLMs using RAGAS (Introduction to RAGAS, Context Recall, Noise Sensitivity, MultiTurnSample, general purpose metrics for summaries and harmfulness).
Section 10: Testing RAG applications with RAGAS (Introduction and setup, creating retrievers and vector stores, MultiTurnSample dataset for RAG, evaluating RAG with RAGAS).