AI Engineering 201 - Inference

Explore advanced AI inference concepts in this comprehensive workshop designed for AI engineers seeking to deepen their understanding of running machine learning models in production. Learn the fundamentals of inference workloads and compare proprietary versus open-source models, examining their respective advantages and trade-offs in real-world applications. Discover various execution environments including end-user devices and network-based solutions, while mastering the principles of serving inference at scale. Dive into inference-as-a-service providers and understand how to leverage cloud infrastructure and serverless GPU solutions for optimal performance. Examine rack-and-stack approaches for inference deployment and master GPU arithmetic calculations essential for resource planning. Explore specialized hardware options including TPUs and other custom silicon designed for inference optimization. Conclude by learning containerization strategies for inference services, enabling scalable and maintainable AI deployments across different environments.

Syllabus

0:00:00 Intro & Overview
0:03:52 What is Inference?
0:10:16 Proprietary Models for Inference
0:21:22 Open Models for Inference
0:30:41 Will Open or Proprietary Models Win Long-Term?
0:36:19 Q&A on Models
0:44:12 Inference on End-User Devices
1:04:32 Inference-as-a-Service Providers
1:10:00 Cloud Inference and Serverless GPUs
1:17:46 Rack-and-Stack for Inference
1:20:12 Inference Arithmetic for GPUs
1:27:07 TPUs and Other Custom Silicon for Inference
1:36:11 Containerizing Inference and Inference Services