AI Inference Workloads - Solving MLOps Challenges in Production
Toronto Machine Learning Series (TMLS) via YouTube
AI Engineer - Learn how to integrate AI into software applications
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Overview
Syllabus
Intro
Agenda
The Machine Learning Process
Deployment Types for Inference Workloads
Machine Learning is Different than Traditional Software Engineering
Low Latency
High Throughput
Maximize GPU Utilization
Embedding ML. Models into Web Servers
Decouple Web Serving and Model Serving
Model Serving System on Kubernetes
Multi-Instance GPU (MIG)
Run:Al's Dynamic MIG Allocations
Run 3 instances of type 2g.10gb
Valid Profiles & Configurations
Serving on Fractional GPUs
A Game Changer for Model Inferencing
Taught by
Toronto Machine Learning Series (TMLS)