Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Model Serving Systems: Containers, APIs & Scalability

Board Infinity via Coursera

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
"Docker and Model Serving: Deploy ML APIs with FastAPI and ONNX is designed for ML engineers, MLOps practitioners, and backend developers who want to take models from notebooks to production. You'll learn to build Docker containers for ML workloads, design scalable REST APIs with FastAPI, serialize models with ONNX and SavedModel, and deploy with zero-downtime strategies like blue-green and canary releases. The first module covers Docker fundamentals, image optimization, multi-stage builds, secrets management, and Docker Compose for multi-container ML apps. The second module focuses on REST API design with FastAPI, model versioning, input validation with Pydantic, structured logging, and production-grade error handling. The third module teaches scaling strategies — horizontal scaling, async queues, load balancing, batch vs. real-time inference, and latency optimization for high-throughput serving. The final module covers model serialization formats (ONNX, pickle, SavedModel), blue-green and canary deployments, automated rollback, and disaster recovery. By the end of this course, you will: - Build and optimize Docker images for ML models using multi-stage builds and Compose - Design scalable FastAPI endpoints with versioning, validation, and observability - Scale ML inference with async queues, load balancing, and latency optimization - Deploy models with ONNX serialization and zero-downtime blue-green rollbacks"

Syllabus

  • Docker for ML
    • This module introduces containerization fundamentals and shows learners how to build efficient Docker images for ML workloads, ensuring portability and reproducibility across environments.
  • API Design for ML Serving
    • Learners develop and refine REST APIs for ML model inference, focusing on reliability, scalability, and real-world best practices.
  • Scaling Model Serving
    • This module emphasizes scalability, concurrency, and optimization for production-grade model serving systems.
  • Model Serialization and Deployment
    • The final module demonstrates how to save, deploy, and safely roll back production models while maintaining uptime and integrity.

Taught by

Board Infinity

Reviews

Start your review of Model Serving Systems: Containers, APIs & Scalability

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.