Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Google

Architecting an AI Inference Stack

Google via Google Skills

Overview

Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
This course is designed for developers looking to build an optimized AI inference stack on Google Cloud. Whether you’re working with GPUs or TPUs, you’ll explore the fundamental components of an inference stack, learn design principles for maximizing performance and reliability, and explore practical techniques to take your workloads from 0 to 1.

Syllabus

  • Foundational concepts
    • Introduction: Architecting an AI inference stack (with GPUs or TPUs)
    • What is inference?
    • Differentiate between popular AI/ML frameworks and understand their roles in defining, training, and serving models
    • Identify the four common performance bottlenecks in AI and understand how they apply to different model architectures
    • Compare the available orchestration options for Kubernetes and Slurm
    • Exploring your orchestration options
    • Quiz
  • Inference concepts
    • Use vLLM to increase throughput and reduce latency when serving large AI models
    • Reviewing important inference concepts
    • Deploy scalable and reliable AI inference workloads on Google Cloud by applying principles like multi-region support and leveraging the GKE Inference Gateway
    • Reviewing the best practices for architecting an inference stack on GKE
    • Guided tutorial: GKE Inference Quickstart
    • Conclusion
    • Quiz
  • Appendix
    • Reading List
  • Your Next Steps
    • Claim credential

Reviews

Start your review of Architecting an AI Inference Stack

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.