Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

How Fast Are LLM Inference Engines Anyway?

AI Engineer via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore comprehensive benchmarking results comparing the performance of various LLM inference engines in this 16-minute conference talk. Discover how the landscape of open weights models and open source inference servers has dramatically evolved, creating an abundance of choices for AI engineers looking to self-host inference solutions. Learn from hundreds of benchmark runs conducted across different models, frameworks, and hardware configurations to understand which options deliver the best performance for your specific needs. Gain practical insights and proven tips from teams successfully deploying LLM inference at scale, helping you navigate the complex decision-making process when selecting the right inference engine for your applications.

Syllabus

How fast are LLM inference engines anyway? — Charles Frye, Modal

Taught by

AI Engineer

Reviews

Start your review of How Fast Are LLM Inference Engines Anyway?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.