AI Infrastructure: Deployment Types

Overview

This course provides a comprehensive guide to deploying, managing, and optimizing AI and high-performance computing (HPC) workloads on Google Cloud. Through a series of lessons and practical demonstrations, you’ll explore diverse deployment strategies, ranging from highly customizable environments using Google Compute Engine (GCE) to managed solutions like Google Kubernetes Engine (GKE). Specifically, you’ll learn how to create clusters and deploy GKE for inference.

Syllabus

Course overview

This module offers an overview of the course and outlines the learning objectives.

Cluster creation process

This module details the AI Hypercomputer cluster creation process. It covers the key decisions required, including choosing a machine type, consumption option, deployment option, orchestrator, and cluster image.

Creating a cluster with Compute Engine

This module identifies key configuration options and optimization techniques for deploying an AI Hypercomputer cluster on Google Compute Engine (GCE). It covers selecting machine types, accelerator OS images, deployment options, and strategies for optimizing network performance.

Building with Google Kubernetes Engine (GKE)

This module identifies configuration options for deploying an AI Hypercomputer cluster on Google Kubernetes Engine (GKE). It covers containerization, GKE modes of operation, networking configurations, and workload optimization techniques like distributed training and GPU sharing.

Deploying with GKE for Inference

This module examines optimization techniques for architecting an inference workload on GKE. It covers the GKE inference workflow, key infrastructure and model-level optimizations.