Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

MLOps and LLMOps: Deploying and Scaling AI in Production

Board Infinity via Coursera

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This intermediate course equips ML engineers, data scientists, and software engineers with the practical skills needed to design, deploy, and scale production AI systems. You’ll learn how to architect reliable ML and LLM applications, including model serving patterns, feature stores, and retrieval-augmented generation (RAG) components. The course walks through reproducible training and experimentation pipelines with tools like MLflow and Weights & Biases, from experiment tracking and model registration to production deployment. You will configure CI/CD workflows tailored to ML and LLM systems, covering data, model, and prompt versioning, automated testing, and safe rollback strategies. The course emphasizes security, privacy, and compliance best practices, including access control, secrets management, and safe handling of user and training data. You’ll design scalable serving infrastructure using containers, Kubernetes, and autoscaling, and apply deployment patterns such as canary, blue-green, shadow, and A/B testing to introduce changes safely. Finally, you’ll build automated evaluation and observability for production AI. This includes automated evaluation pipelines (e.g., LLM-as-a-judge) wired into CI/CD gates, defining and tracking key quality and performance metrics like hallucination rate, latency, throughput, and cost per request, and implementing robust logging, metrics, distributed tracing, and telemetry. You will also detect and monitor data and model drift, bias, and degradation over time using tools such as Arize Phoenix, design alerting strategies, and collaborate with product and reliability teams to establish incident response, runbooks, and continuous improvement processes for AI systems at scale. Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.

Syllabus

  • Operationalizing AI Pipelines (CI/CD, CT/CD, Versioning)
    • Start by grounding learners in practical, production-ready system design for ML and LLM applications. This module connects architectural patterns—serving topologies, feature stores, and retrieval-augmented generation (RAG)—to reproducible experimentation and compliant design decisions. Expect short instructor videos, readings that map design trade-offs, and hands-on exercises using experiment-tracking tools to make architectures actionable.
  • LLMOps Fundamentals: Context, Prompts, Inference Optimization
    • Move from design to continuous delivery: this module teaches how to build CI/CD pipelines tailored to ML and LLM systems and how to gate changes with automated evaluation. Learners will set up data, model, and prompt versioning, define meaningful metrics (accuracy, hallucination rate, latency, cost), and implement evaluation pipelines—including LLM-as-a-judge methods—that plug into CI/CD gates. Activities include guided configuration examples, scenario-driven readings, and automated practice quizzes.
  • Evaluation: From Vibes to Metrics
    • This module focuses on the operational mechanics of serving models and LLMs at scale. You will design and implement containerized serving architectures using orchestration (e.g., Kubernetes), autoscaling, and cost-aware inference pipelines; practice deployment patterns such as canary, blue-green, shadow, and A/B testing; and learn prompt and context-window optimization techniques to balance latency, quality, and cost. Practical labs and demonstrations show real-world manifests, autoscaling configs, and inference pipeline tuning.
  • Observability & Tracing for Production AI
    • Close the loop by instrumenting systems for deep observability and long-term reliability. Learners will add logging, metrics, distributed tracing, and telemetry; use monitoring platforms (e.g., Arize Phoenix) to detect data/model drift, bias, and degradation; and design alerting and runbooks while coordinating incident response with product and reliability teams. The module culminates in a hands-on capstone programming project that integrates architecture, CI/CD, serving, evaluation, and monitoring into a production-ready AI solution.

Taught by

Board Infinity

Reviews

Start your review of MLOps and LLMOps: Deploying and Scaling AI in Production

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.