Learn Backend Development Part-Time, Online
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore comprehensive benchmarking results comparing the performance of various LLM inference engines in this 16-minute conference talk. Discover how the landscape of open weights models and open source inference servers has dramatically evolved, creating an abundance of choices for AI engineers looking to self-host inference solutions. Learn from hundreds of benchmark runs conducted across different models, frameworks, and hardware configurations to understand which options deliver the best performance for your specific needs. Gain practical insights and proven tips from teams successfully deploying LLM inference at scale, helping you navigate the complex decision-making process when selecting the right inference engine for your applications.
Syllabus
How fast are LLM inference engines anyway? — Charles Frye, Modal
Taught by
AI Engineer