Apache Spark on Kubernetes - The RIGHT Way - No Master/Worker Clusters Needed
CodeWithYu via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to deploy Apache Spark on Kubernetes without permanent master/worker clusters in this comprehensive 58-minute tutorial. Discover how to run Spark jobs with zero permanent infrastructure by creating pods only when jobs execute and automatically cleaning up afterward, eliminating costly always-on Spark clusters. Master the deployment of production-ready Spark setups that auto-scale executors and self-cleanup, while building custom Spark images with embedded PySpark jobs. Explore real-world analytics including customer segmentation, cohort analysis, and revenue trends through hands-on examples. Set up the Spark History Server for comprehensive job monitoring and implement proper RBAC security configurations for production environments. Debug and monitor jobs effectively using kubectl and Spark UI tools. The tutorial covers system architecture design, Kubernetes setup, project configuration, namespace management, service accounts, and RBAC implementation. Build a complete Spark control dashboard and API layer for job management, then practice job submissions and reviews through the Spark dashboard interface.
Syllabus
Introduction
System Architecture
Setting up K8S
Setting up the project
K8S Namespaces
K8s Service Accounts, RBAC
Creating Spark Jobs for K8S
k8s Spark History Server
Spark Control Dashboard
k8s API layer
Spark Dashboard, Job submissions and review
Outro
Taught by
CodeWithYu