Completed
00:00 - Introduction
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Curating Text Data for Pre-training LLMs using GPU-accelerated Modules from NVIDIA NeMo Curator
Automatically move to the next video in the Classroom when playback concludes
- 1 00:00 - Introduction
- 2 01:02 - Understanding All the Different Components
- 3 01:38 - Download and Conversion
- 4 02:47 - Downloading the Dataset
- 5 03:38 - Implementing the Document Extractor
- 6 05:32 - Clean and Unify the Dataset
- 7 06:26 - Quotation Unifier
- 8 07:06 - Unicode Reformatter
- 9 11:06 - Redact PII