In this course, learners will load and inspect a Kaggle dataset, perform exploratory data analysis, preprocess features, and build baseline regression models to establish initial performance benchmarks.
Overview
Syllabus
- Unit 1: Loading and Inspecting a Dataset with Pandas and Scikit-Learn
- Importing and Previewing the Dataset
- Inspecting DataFrame Structure
- Mini-Challenge: Quick Stats Detective
- Debugging Data Inspection Method Calls
- Unit 2: Exploratory Data Analysis and Visualization with Matplotlib and Seaborn
- Identifying Numerical Features with Select Types
- Identifying Categorical Features with Select Types
- Customizing Histograms for Focused Analysis
- Enhancing Histograms with KDE Curves
- Creating Feature Correlation Heatmaps
- Creating Masked Correlation Heatmaps
- Unit 3: Data Cleaning: Handling Missing Values and Encoding Categorical Features
- Finding Missing Values in Your Data
- Fixing Numerical Missing Values Consistently
- Imputing All Numerical Features Efficiently
- Consistent Categorical Encoding for Models
- Building a Reusable Preprocessing Function
- Unit 4: Building and Evaluating Baseline Regression Models
- Preparing Data for Baseline Models
- Training Your First Linear Regression Model
- Building a LightGBM Baseline Model
- Comparing Models and Analyzing Feature Importance