Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CodeSignal

Behavioral Benchmarking of LLMs

via CodeSignal

Overview

In this course, you’ll experiment with deeper aspects of LLM evaluation: token usage efficiency, temperature sensitivity, model output consistency, and detecting hallucinations. Through lightweight API experiments, you’ll develop intuition for how models behave beyond accuracy scores.

Syllabus

  • Unit 1: Measuring and Interpreting Token Usage in LLMs
    • Comparing Token Counts to Prompt and Answer Lengths
    • Exploring Prompt Length and Token Usage
    • Refactoring Token Usage for Cleaner Code
  • Unit 2: Exploring Temperature Sensitivity in LLM Outputs
    • Comparing Low and High Temperature Outputs
    • Exploring the Temperature Creativity Spectrum
    • Comparing Models at Same Temperature
  • Unit 3: Measuring Model Consistency Across Reruns
    • Refactoring for Cleaner Consistency Checks
    • Parameterizing Consistency Test Runs
    • Tracking Response Patterns with Frequency Analysis
  • Unit 4: Using LLMs as Fact-Checkers for Hallucination Detection
    • Generating Answers with GPT Models
    • Building a Complete Fact-Checking Pipeline
    • Building a Complete Fact Checking Pipeline
    • Organizing Fact Check Results for Clarity

Reviews

Start your review of Behavioral Benchmarking of LLMs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.