Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Google

Clean Your Data

Google via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this course, you’ll explore three exploratory data analysis (EDA) practices: cleaning, joining, and validating. You'll discover the importance of these practices for data analysis, and you’ll use Python to clean, validate, and join data. By the end of this course, you will be able to: • Apply input validation skills to a dataset with Python • Explain the importance of input validation • Demonstrate how to transform categorical data into numerical data with Python • Explain the importance of categorical versus numerical data in a dataset • Explain the importance of recognizing outliers in a dataset • Demonstrate how to identify outliers in a dataset with Python • Understand when to contact stakeholders or engineers regarding missing values • Explain the importance of ethically considering missing values • Demonstrate how to identify missing data with Python

Syllabus

  • The challenge of missing or duplicate data
    • Missing or duplicate data can appear in datasets for numerous reasons. The impact of missing values can vary depending on how many are present. In this module, you will learn strategies to address missing data entries, determine when deduplication is needed, and use common Python functions for handling duplicates.
  • The ins and outs of data outliers
    • Outliers are data points that stand out amongst others. A tactful approach to outliers recognizes the human stories and real-world effects they represent. In this module, you will learn the types of outliers, how to handle them, and visualize them.
  • Change categorical data to numerical data
    • Data models typically work better with numerical inputs. To facilitate this, categorical data is encoded into numeric digits for analysis. In this module, you will learn why this transformation is needed, what dummy variables are, and how to select the right encoding method.
  • Input validation
    • Input validation focuses on thoroughly checking data for completeness and to eliminate errors. In this module, you will learn why validation minimizes errors, how to detect improper inputs, and why it's essential for joining datasets.
  • Review: Clean your data
    • Review everything you’ve learned and take the final assessment.

Taught by

Google Career Certificates

Reviews

4.9 rating at Coursera based on 21 ratings

Start your review of Clean Your Data

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.