This course extends data cleaning techniques to handle text-based data in tabular datasets. It covers cleaning and processing text columns, dealing with mixed data types, extracting meaningful features from text, and preparing text data for machine learning.
Overview
Syllabus
- Unit 1: Handling Text Columns in Tabular Data with Python
- Refine Your Text Cleaning Skills
- Fix Numeric Values in Text Data
- Standardize Synonyms for Clean Data
- Track and Log Data Transformations
- Unit 2: Removing Special Characters and Normalizing Text Using Python
- Enhance Text Readability with Python
- Fix the Text Formatting Issue
- Enhance Text with Unicode Normalization
- Logging Special Character Removal
- Unit 3: Handling Mixed Data Types in Columns Using Python
- Transform Percentages in Data Columns
- Debugging Accounting Format in Data
- Log and Handle Non-numeric Entries
- Categorize Prices with Python Functions
- Unit 4: Preparing Text Data for Machine Learning Using Python
- Bigrams in Feature Extraction
- Enhance Text Preprocessing Skills
- Enhance Text with N-Gram TF-IDF