Completed
Raw Code for Pre-Training
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Inside OpenCoder - Data Processing and Training Pipeline for Code LLMs
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 OpenCoder
- 3 OpenCoder Goals
- 4 Pre-Training Data
- 5 RefineCode
- 6 Raw Code for Pre-Training
- 7 Data Preprocessing
- 8 Data Deduplication
- 9 How Data Deduplication Improved OpenCoder
- 10 Data Transformation
- 11 Data Filtering
- 12 Sampling
- 13 Code-Related Data
- 14 Post Training
- 15 The Two Stages of Instruct Tuning
- 16 Evaluation
- 17 Conclusion & Future Work