Learn the Skills Netflix, Meta, and Capital One Actually Hire For
Build GenAI Apps from Scratch — UCSB PaCE Certificate Program
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to build a BERT WordPiece tokenizer from scratch using Python and HuggingFace in this comprehensive tutorial video. Explore the process of creating a custom tokenizer for specific use cases, particularly for uncommon languages or less tech-savvy domains. Dive into the intricacies of the WordPiece tokenizer used by BERT, a popular transformer model for various language-based machine learning tasks. Follow along as the instructor guides you through downloading datasets, utilizing HuggingFace's tools, and implementing the tokenizer code. Gain valuable insights into the tokenizer walkthrough and understand how this fundamental step can enhance your natural language processing projects.
Syllabus
Intro
WordPiece Tokenizer
Download Data Sets
HuggingFace
Dataset
Tokenizer
Tokenizer Walkthrough
Tokenizer Code
Taught by
James Briggs