Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Hadoop and Spark Fundamentals: Unit 2

via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This course introduces the fundamentals of modern data processing for data engineers, analysts, and IT professionals. You will learn the basics of Hadoop MapReduce, including how it works, how to compile and run Java MapReduce programs, and how to debug and extend them using other languages. The course includes practical exercises such as word counts across multiple files, log file analysis, and large-scale text processing with datasets like Wikipedia. You will also cover advanced MapReduce features and use tools like Yarn and the Job Browser. The course then covers higher-level tools such as Apache Pig and Hive QL for managing data workflows and running SQL-like queries. Finally, you will work with Apache Spark and PySpark to gain experience with modern data analytics platforms. By the end of the course, you will have practical skills to work with big data in various environments.

Syllabus

  • Hadoop and Spark Fundamentals: Unit 2
    • This module introduces the core components of big data processing with Hadoop and Spark. It covers the fundamentals of Hadoop MapReduce, including its operation, programming, and debugging, followed by practical examples such as word count, log analysis, and benchmarking. The module then explores higher-level tools like Apache Pig and Hive for simplified data processing. Finally, it introduces Apache Spark and its Python interface, PySpark, highlighting Spark’s growing role in data analytics.

Taught by

Pearson and Douglas Eadline, PhD

Reviews

Start your review of Hadoop and Spark Fundamentals: Unit 2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.