Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Text Data Preprocessing in Python

Go to class Write review

Overview

Learn to clean and prepare textual data for machine learning models using Python. This course teaches you to apply basic preprocessing tasks such as text lowercasing, removing stopwords, tokenization, and stemming on the SMS Spam Collection dataset. By the end of this course, you’ll have the skills to transform raw text into a format that's ready for NLP tasks.

Syllabus

Unit 1: Lowercasing Text for Uniformity in NLP

Introduction to Lowercase Text Conversion
Lowercasing Spam Dataset Messages
Transforming Text to Lowercase for Data Uniformity
Mastering Text Lowercasing in Python

Unit 2: Punctuating Punctuation: Streamlining Text for NLP

Removing Text Punctuation Simplified
Removing Commas from Text
Debugging Punctuation Removal Exercise
Crafting Clean Text Data

Unit 3: Tokenizing Text Data in NLP with Python and NLTK

Efficient Text Preprocessing with NLTK
Streamlining Text Processing with NLTK
Implementing Tokenization Basics
Mastering Tokenization with NLTK

Unit 4: Demystifying Stop Words in Natural Language Processing

Stop Words Demystified in NLP
Adapting Stop Words Removal for Spanish
Debugging Stop Words Removal
Setting the Stage for Stop Words Removal in Text Data
Mastering Stop Words Removal

Unit 5: Mastering Stemming in NLP with NLTK

Putting Stemming into Action
Debugging Data Preprocessing Steps
Applying Stemming to Normalize Text
Mastering Text Preprocessing Techniques

Reviews

Start your review of Text Data Preprocessing in Python