Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Building Scalable, Observable MLOps Systems on Google Cloud

Conf42 via YouTube

Start learning Write review

Learn to build production-ready MLOps systems on Google Cloud Platform through this comprehensive conference talk that addresses the critical challenges of deploying machine learning models at scale. Discover why production ML is inherently difficult and explore the four core barriers that prevent successful deployment: infrastructure complexity, tooling fragmentation, model drift, and governance requirements. Master an end-to-end MLOps framework leveraging Google Cloud's managed service stack including Vertex AI, Dataflow, Cloud Run, and BigQuery. Understand how to choose appropriate deployment patterns for real-time, batch, and streaming scenarios based on your specific use case requirements. Examine a detailed automotive case study demonstrating oil change prognostics architecture to see these concepts applied in practice. Dive deep into production observability strategies for monitoring both model performance and system health, including comprehensive coverage of drift detection, data skew identification, and performance feedback loops. Explore infrastructure telemetry implementation and incident response procedures for maintaining reliable ML systems. Learn to implement CI/CD pipelines specifically designed for machine learning models, including progressive rollout strategies, continuous monitoring, and automated rollback mechanisms. Understand governance frameworks, automated retraining pipelines, and human-in-the-loop approval processes essential for enterprise-grade MLOps. Gain practical insights into scaling reliable MLOps systems through proven blueprints and best practices demonstrated throughout this technical presentation.

Syllabus

Welcome & the ML-to-Production Journey on GCP
Why Production ML Is Hard
The 4 Core Barriers: Infra, Tooling, Drift, Governance
End-to-End MLOps Framework on Google Cloud
GCP Managed Service Stack: Vertex AI, Dataflow, Cloud Run, BigQuery
Choosing Deployment Patterns: Real-time vs Batch vs Streaming
Automotive Case Study: Oil Change Prognostics Architecture
Production Observability: Seeing Model + System Health
Model Monitoring Deep Dive: Drift, Skew, Performance Loops
Infrastructure Telemetry & Incident Response in Practice
CI/CD for Models: Progressive Rollouts, Monitoring & Rollback
Governance, Retraining Pipelines & Human-in-the-Loop Approvals
Wrap-Up: Blueprint to Scale Reliable MLOps