Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CodeSignal

Text Data Preprocessing in Python

via CodeSignal

Overview

Learn to clean and prepare textual data for machine learning models using Python. This course teaches you to apply basic preprocessing tasks such as text lowercasing, removing stopwords, tokenization, and stemming on the SMS Spam Collection dataset. By the end of this course, you’ll have the skills to transform raw text into a format that's ready for NLP tasks.

Syllabus

  • Unit 1: Lowercasing Text for Uniformity in NLP
    • Introduction to Lowercase Text Conversion
    • Lowercasing Spam Dataset Messages
    • Transforming Text to Lowercase for Data Uniformity
    • Mastering Text Lowercasing in Python
  • Unit 2: Punctuating Punctuation: Streamlining Text for NLP
    • Removing Text Punctuation Simplified
    • Removing Commas from Text
    • Debugging Punctuation Removal Exercise
    • Crafting Clean Text Data
  • Unit 3: Tokenizing Text Data in NLP with Python and NLTK
    • Efficient Text Preprocessing with NLTK
    • Streamlining Text Processing with NLTK
    • Implementing Tokenization Basics
    • Mastering Tokenization with NLTK
  • Unit 4: Demystifying Stop Words in Natural Language Processing
    • Stop Words Demystified in NLP
    • Adapting Stop Words Removal for Spanish
    • Debugging Stop Words Removal
    • Setting the Stage for Stop Words Removal in Text Data
    • Mastering Stop Words Removal
  • Unit 5: Mastering Stemming in NLP with NLTK
    • Putting Stemming into Action
    • Debugging Data Preprocessing Steps
    • Applying Stemming to Normalize Text
    • Mastering Text Preprocessing Techniques

Reviews

Start your review of Text Data Preprocessing in Python

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.