Designing Production LLM Architectures

Coursera via Coursera

Go to class Write review

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

This course is for ML engineers, solutions architects, and senior developers who build robust infrastructure powering large language models. This course teaches you how to design, deploy, and maintain the complex, interconnected systems required for scalable, resilient, and cost-effective LLM applications in the real world. You will learn to think like an architect, starting with foundational design choices. Using sequence diagrams and structured analysis, you will compare synchronous and asynchronous architectures and evaluate the critical trade-offs between self-hosting open-source models and using managed APIs, considering total cost of ownership, latency, and data privacy. The course then dives deep into building for resilience and scale, applying the 12-factor app methodology to design stateless, configurable microservices. You’ll learn to analyze multi-region deployment strategies for fault tolerance and to use container orchestration manifests like Helm to deploy scalable applications capable of handling production workloads. Finally, you’ll master the data backbone of your system by designing automated data pipelines with tools like Airflow and learning to manage the complexities of schema evolution.

Syllabus

Design, Compare and Analyze LLM Architectures

This module empowers engineers and architects to master the "build vs. buy" decision for LLM applications through a structured, strategic lens. You will learn to design complex system architectures using sequence diagrams to evaluate synchronous and asynchronous processing, while comparing the trade-offs of self-hosted open-source models against managed APIs. By focusing on critical metrics like Total Cost of Ownership (TCO), latency, and data privacy, you will develop the expertise to justify architectural choices. Ultimately, you'll gain the confidence to document and defend high-performance, business-aligned AI solutions to any stakeholder.

Architect Resilient LLM Microservices for Scale

This module explores building resilient, scalable architectures for LLM applications. You will apply 12-factor app methodology to design portable, cloud-native microservices, mastering stateless design and dependency management. The curriculum bridges theory and practice by evaluating multi-region deployment strategies for fault tolerance and high availability. You'll learn to analyze failover mechanisms and mitigate architectural risks before production. By the end, you’ll be equipped to document reliable, future-proof AI systems. Prerequisites include a foundational understanding of cloud concepts (regions/zones) and microservice basics (containers/APIs).

Analyze and Deploy Scalable LLM Architectures

This module teaches how to transition LLM prototypes into production-grade services. You will learn to analyze multi-stage architectures like RAG to identify and quantify performance bottlenecks using evidence-based metrics. The curriculum focuses on mastering Kubernetes deployment through declarative Helm charts and implementing Horizontal Pod Autoscaling (HPA) to manage unpredictable traffic. By studying deployment lifecycles, including controlled rollouts and rapid rollbacks, you will gain the skills to transform fragile prototypes into resilient, scalable, and reliable production systems capable of handling real-world loads.

Automate Data Pipelines: Schema Evolution

In today's dynamic data landscape, pipelines often break when source data structures change unexpectedly—a problem known as schema drift. This module tackles that challenge head-on, teaching you how to design and automate data pipelines that can gracefully handle schema evolution using Apache Airflow. By the end, you will be equipped to create resilient, scalable, and fully automated data pipelines that are built to withstand the complexities of real-world data environments.

Analyzing a Flawed LLM Architecture Design

In the module, you will step into the high-stakes role of a senior systems engineer tasked with diagnosing a failing AI service. A critical Retrieval-Augmented Generation (RAG) system is plagued by high latency and intermittent outages, and you must get to the root of the problem. Using architectural diagrams, system logs, and performance metrics, you will analyze the system’s design to identify the primary performance bottleneck and the most significant single point of failure. Your analysis will culminate in a concise, two-paragraph report for stakeholders, pinpointing the critical issues and recommending targeted fixes to restore stability and performance.