Taming Data Skew in Production ML Pipelines

Taming Data Skew in Production ML Pipelines

Conf42 via YouTube Direct link

Roku Scale & the Mystery of Suddenly Slower Spark Jobs

2 of 15

2 of 15

Roku Scale & the Mystery of Suddenly Slower Spark Jobs

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Taming Data Skew in Production ML Pipelines

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro: Handling Data Skew in Production ML Pipelines Roku
  2. 2 Roku Scale & the Mystery of Suddenly Slower Spark Jobs
  3. 3 How Skew Shows Up in Spark: Stragglers, Shuffle Spills, Idle Executors
  4. 4 What Data Skew Really Is and Why Parallelism Breaks
  5. 5 Real-World Example: Power Users, Hot Keys, and Power-Law Data
  6. 6 Why It Matters: Technical Bottlenecks + Business Cost Blowups
  7. 7 Where Skew Hits ML Pipelines: Recs, Classification, Computer Vision
  8. 8 Root Causes of Skew #1: Natural Imbalance from Real-World Events
  9. 9 Root Causes of Skew #2: Join-Key & Aggregation Skew in Feature Engineering
  10. 10 Root Causes of Skew #3: Computational Skew NLP, Embeddings, Heavy Transforms
  11. 11 Mitigation Step 1: Repartitioning—When It Works and Its Limits
  12. 12 Mitigation Step 2: Key Salting to Split Hot Keys Big Runtime Wins
  13. 13 Mitigation Step 3: Broadcast Joins to Avoid Massive Shuffles
  14. 14 Wrap-Up: Choosing the Right Fix + AI to Predict Skew Before It Happens
  15. 15 Closing & How to Connect

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.