Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Gain practical, hands-on experience installing and running Hadoop and Spark on your own desktop or laptop, and progress to managing real-world cluster deployments. Through engaging lessons and interactive examples, you’ll master essential concepts such as HDFS, MapReduce, PySpark, HiveQL, and data ingestion tools, while also learning to leverage user-friendly interfaces like Ambari and Zeppelin to streamline analytics workflows and cluster administration. By the end of this course, you’ll possess the foundational skills and confidence to begin your journey in big data analytics and explore the vast Hadoop ecosystem.
Syllabus
- Course 1: Hadoop and Spark Fundamentals: Unit 1
- Course 2: Hadoop and Spark Fundamentals: Unit 2
- Course 3: Hadoop and Spark Fundamentals: Unit 3
Courses
-
This course provides a practical introduction to the Apache Hadoop ecosystem. You will learn the basic skills needed to analyze and manage large, unstructured datasets. The course covers core concepts such as the data lake, MapReduce, and using Spark for analytics. You will install and configure Hadoop on your own computer using the Hortonworks HDP sandbox. The course includes instruction on the Hadoop Distributed File System (HDFS), its architecture, and how to use it in real-world situations. This course is suitable for beginners and those looking to expand their data analytics skills. By the end, you will understand the fundamentals of Hadoop and Spark for scalable data processing.
-
This course introduces the fundamentals of modern data processing for data engineers, analysts, and IT professionals. You will learn the basics of Hadoop MapReduce, including how it works, how to compile and run Java MapReduce programs, and how to debug and extend them using other languages. The course includes practical exercises such as word counts across multiple files, log file analysis, and large-scale text processing with datasets like Wikipedia. You will also cover advanced MapReduce features and use tools like Yarn and the Job Browser. The course then covers higher-level tools such as Apache Pig and Hive QL for managing data workflows and running SQL-like queries. Finally, you will work with Apache Spark and PySpark to gain experience with modern data analytics platforms. By the end of the course, you will have practical skills to work with big data in various environments.
-
This course is for those who want to become data engineers or analysts. It covers the key skills needed to manage, process, and analyze large datasets using common industry tools. You will learn how to import data into Hadoop HDFS, Hive tables, and use Spark for direct HDFS imports. The course also covers handling streaming data with Apache Flume and connecting relational databases to Hadoop with Apache Sqoop. You will use Apache Zeppelin for developing Spark applications and learn how to install, monitor, and manage Hadoop clusters with Ambari. The course also introduces advanced HDFS features for data management. By the end, you will be able to use Hadoop and Spark in practical settings.
Taught by
Douglas Eadline, PhD and Pearson