Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Watch a technical conference talk from Ray Summit 2024 where Google engineers Fanhai Lu and Richard Liu present an advanced serving stack for deploying Large Language Models (LLMs) at scale. Learn how to overcome key LLM deployment challenges by combining Ray's distributed computing capabilities with TPU acceleration and Google Kubernetes Engine (GKE) orchestration. Discover architectural strategies for optimizing latency and throughput, managing hardware memory constraints, and scaling cloud compute resources in production environments. Gain practical insights from real-world deployments of models like Llama 3 and explore best practices for implementing GenAI solutions on Google Cloud Platform using XLA+TPUs for computation, Ray for multi-host deployments, and GKE for TPU pod slice orchestration.
Syllabus
Scaling LLMs on Google Cloud: Synergy Between Ray, TPU, and GKE | Ray Summit 2024
Taught by
Anyscale