Build GenAI Apps from Scratch — UCSB PaCE Certificate Program
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
Explore Unicode normalization techniques for Natural Language Processing in Python in this 15-minute video. Learn how to handle annoying font variants, social media text, and diacritics from European languages that can trip up NLP models. Discover the hidden properties of characters like 'Ç' and their impact on text processing. Master the art of dealing with text variants using Unicode normalization to improve the readability and consistency of your input data for more effective NLP applications.
Syllabus
Intro
Diacritics
Decomposition
Conversion
Normal Form
Taught by
James Briggs