Dream Machine - Scaling to 1M Users in 4 Days

Dream Machine - Scaling to 1M Users in 4 Days

AI Engineer via YouTube Direct link

The initial launch challenges [00:00]: Luma AI was unprepared for the high traffic, quickly exhausting their initial GPU allocation and facing a large queue of requests.

1 of 11

1 of 11

The initial launch challenges [00:00]: Luma AI was unprepared for the high traffic, quickly exhausting their initial GPU allocation and facing a large queue of requests.

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Dream Machine - Scaling to 1M Users in 4 Days

Automatically move to the next video in the Classroom when playback concludes

  1. 1 The initial launch challenges [00:00]: Luma AI was unprepared for the high traffic, quickly exhausting their initial GPU allocation and facing a large queue of requests.
  2. 2 Rapid scaling efforts [00:57]: They rapidly scaled their GPU capacity from 500 to 5,000 H100 GPUs within six hours, and later added another 4,000 H100 GPUs from their training cluster to keep up with…
  3. 3 Luma AI's mission [03:10]: Beyond just video models, Luma AI aims to build general multimodal intelligence that can generate, understand, and operate in the physical world.
  4. 4 Their product capabilities [03:22]: They demonstrate a "modify video" feature where users can upload iPhone videos and transform them with text prompts. They also highlight their public API for integ…
  5. 5 Infrastructure re-architecture [06:02]: They moved from a brittle, tightly coupled container setup using Triton inference server to a custom-built serving stack on vanilla PyTorch, which offers bette…
  6. 6 Challenges and solutions in scaling [07:39]:
  7. 7 Back pressure [07:51]: They implemented a dispatch limitation system to prevent too many CPU workers from queuing jobs in one cluster.
  8. 8 Fair scheduling and work starvation [08:36]: To address issues with different user tiers API, enterprise, unlimited, light, free and prevent lower-priority jobs from being starved, they developed an …
  9. 9 Handling different models and bursts [08:43]: They built a system to automatically scale up compute on their training cluster to handle demand bursts [09:16].
  10. 10 Model management [13:24]: They use a model repository system where each model has immutable versions stored in object storage, including the full Python environment and checkpoints. This allows for r…
  11. 11 Hiring [15:13]: Luma AI is actively hiring engineers, researchers, and AI enthusiasts

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.