Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Learn Python with Generative AI - Self Paced Online
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This conference talk by Intel Fellow Mohan J Kumar and Principal Engineer Chuan Song introduces MESA, an online platform for evaluating LLM inference performance across different hardware configurations. Discover how this tool addresses the growing market need for AI inference performance prediction, allowing users to evaluate various models on hardware from multiple vendors. Learn how MESA breaks down inference latency by operation types (GEMM, MatMul) and phases (prefill and regression), providing detailed visual graphs for performance analysis. The presentation demonstrates how context length adjustments impact inference latency through an intuitive web UI. The speakers discuss their plans to contribute this tool to open source through the Open Compute Project, aiming to bring transparency and foster growth in the critical area of AI inference performance projection.
Syllabus
LLM Inference Performance Projection
Taught by
Open Compute Project