Machine Learning Infrastructure at Meta Scale
MLOps World: Machine Learning in Production via YouTube
AI Engineer - Learn how to integrate AI into software applications
AI, Data Science & Business Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the challenges and solutions in scaling machine learning infrastructure at Meta in this 37-minute conference talk from MLOps World: Machine Learning in Production. Gain insights from Shivam Bharuka, Senior AI Infra Engineer at Meta, as he shares his experience in supporting large-scale ranking and recommendation models serving over a billion users. Discover how Meta reimagined its entire AI Infrastructure stack to accommodate rapidly growing machine learning models. Learn about the development of specialized hardware using powerful GPUs and network devices, as well as the design of optimized distributed training algorithms using PyTorch. Understand the approach taken to redesign and scale the stack, addressing performance, reliability, and efficiency concerns in machine learning training infrastructure.
Syllabus
Machine Learning Infrastructure at Meta Scale
Taught by
MLOps World: Machine Learning in Production