Inside OpenCoder - Data Processing and Training Pipeline for Code LLMs

Inside OpenCoder - Data Processing and Training Pipeline for Code LLMs

Oxen via YouTube Direct link

Code-Related Data

13 of 17

13 of 17

Code-Related Data

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Inside OpenCoder - Data Processing and Training Pipeline for Code LLMs

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 OpenCoder
  3. 3 OpenCoder Goals
  4. 4 Pre-Training Data
  5. 5 RefineCode
  6. 6 Raw Code for Pre-Training
  7. 7 Data Preprocessing
  8. 8 Data Deduplication
  9. 9 How Data Deduplication Improved OpenCoder
  10. 10 Data Transformation
  11. 11 Data Filtering
  12. 12 Sampling
  13. 13 Code-Related Data
  14. 14 Post Training
  15. 15 The Two Stages of Instruct Tuning
  16. 16 Evaluation
  17. 17 Conclusion & Future Work

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.