Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CodeSignal

Foundations of NLP Data Processing

via CodeSignal

Overview

Master the foundations of NLP data processing with hands-on practice in text cleaning, vectorization (TF-IDF, bag-of-words, embeddings), modern tokenization methods (BPE, WordPiece, SentencePiece), and efficient large-scale data prep for LLMs. You'll build pipelines that scale from basic preprocessing to embedding storage in vector databases.

Syllabus

  • Unit 1: Text Cleaning and Normalization in NLP
    • Text Cleaning with Regular Expressions
    • Text Normalization in Action
    • Refine Your Text Cleaning Skills
    • Stemming vs Lemmatization Showdown
  • Unit 2: Bag-of-Words and N-Grams in NLP
    • Bag-of-Words Model Implementation
    • Enhance Text Analysis with N-Grams
    • Text Classification with Bag-of-Words
  • Unit 3: Introduction to TF-IDF Vectorization in NLP
    • Uncover Key Terms with TF-IDF
    • Enhance Text Analysis with Bigrams
    • Trigram Analysis with TF-IDF
    • Comparing BoW and TF-IDF
  • Unit 4: Introduction to Word Embeddings
    • Exploring Word Similarity with GloVe
    • Exploring Word Synonyms with Embeddings
    • Word Analogy with GloVe
    • Visualize Word Embeddings with PCA

Reviews

Start your review of Foundations of NLP Data Processing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.