Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Hugging Face Course - Fast Tokenizers and Token Classification Pipelines - Chapter 6

HuggingFace via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn advanced tokenization techniques and pipeline internals in this comprehensive tutorial covering fast tokenizers, their performance advantages, and implementation details across PyTorch and TensorFlow frameworks. Explore the inner workings of token classification and question answering pipelines, understanding how they process text and generate predictions. Master the creation of custom tokenizers by examining normalization processes, pre-tokenization steps, and three major tokenization algorithms: Byte Pair Encoding (BPE), WordPiece, and Unigram tokenization. Gain hands-on experience building tokenizers from scratch and understand the technical foundations that make modern NLP models efficient and effective in processing natural language text.

Syllabus

Why are fast tokenizers called fast?
Fast tokenizer superpowers
Inside the Token classification pipeline (PyTorch)
Inside the Token classification pipeline (TensorFlow)
Inside the Question answering pipeline (PyTorch)
Inside the Question answering pipeline (TensorFlow)
Training a new tokenizer
What is normalization?
What is pre-tokenization?
Byte Pair Encoding Tokenization
WordPiece Tokenization
Unigram Tokenization
Building a new tokenizer

Taught by

Hugging Face

Reviews

Start your review of Hugging Face Course - Fast Tokenizers and Token Classification Pipelines - Chapter 6

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.