Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore enterprise-scale ETL optimization techniques for Apache Spark across AWS services including AWS Glue, Amazon EMR, and Amazon SageMaker in this 50-minute conference talk from AWS re:Invent 2025. Discover recent enhancements in Spark that deliver faster read and write throughput, accelerated processing of common file formats, and expanded Amazon S3 support through the S3A protocol for greater flexibility in write operations. Learn about improvements in distributed computation and in-memory storage that enable efficient data aggregation and job optimization. Examine how these innovations combine with Spark's native capabilities to strengthen governance and encryption while maintaining performance, control, and compliance. Gain practical insights for building unified, secure, and high-performance ETL pipelines on AWS using Apache Spark for large-scale data processing workloads.
Syllabus
AWS re:Invent 2025 - Enterprise-scale ETL optimization for Apache Spark (ANT336)
Taught by
AWS Events