Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Apache Spark? If Only It Worked

Devoxx via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore common challenges and optimization techniques for Apache Spark in this 31-minute conference talk from Devoxx. Gain insights into dealing with skewed data, understanding Spark on YARN and its memory model, effective caching strategies, sizing executors, and achieving data locality. Learn from real-world examples and practical solutions to improve performance and stability in Spark applications. Discover a framework for troubleshooting and optimizing Spark jobs, covering topics such as RDD evaluation, execution plans, and debugging tools. Benefit from the speaker's extensive experience working with data infrastructure at companies like VRBO, Spotify, TrueCaller, and Apple.

Syllabus

Introduction
My experience with Spark
Outline of the talk
What is Spark
RDD
Pipelines
Execution Unit
Executor
executor size
small executors
Spark memory model
Memory overhead
Shuffle
In practice
Spark UI
Execution Plan
Skew Data
Locality
Check locality
RDD lazily evaluated
RDD calculation twice
Spark caching
Spark optimization
Map volumes
Improve shuffle
Recap
Debugging tools
Challenge
Use Case
Summary
Questions

Taught by

Devoxx

Reviews

Start your review of Apache Spark? If Only It Worked

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.