Explore PySpark MLlib and develop essential machine learning skills. Prepare datasets, train models, make predictions, and evaluate performance, gaining confidence in deploying models with PySpark's powerful MLlib capabilities.
Overview
Syllabus
- Unit 1: Preparing Dataset with MLlib
- Complete the Data Preprocessing
- Adjust Dataset Split Ratio
- Fixing PySpark Preprocessing Issues
- Convert Categorical Labels with StringIndexer
- Master Feature Vectorization with MLlib
- Unit 2: Training a Classification Model with MLlib
- Train a Model with PySpark
- Fix Mistakes in Model Training
- Complete PySpark Model Training
- Switch Models in PySpark
- Unit 3: Making Predictions and Evaluating Model Performance
- Complete the Model Evaluation
- Switch Metric to Evaluate Model
- Debugging Model Evaluation Code
- Implement Model Evaluation
- Unit 4: Saving and Loading Trained MLlib Models
- Complete Model Persistence with PySpark
- Fix the Model Persistence Error
- Saving Your Model Efficiently
- Master Model Persistence with PySpark