Overview
Syllabus
0:00 Advanced Data Preparation Techniques
0:33 Video Overview
1:52 Synthetic Dataset Generation Goals
3:48 Synthetic Data Generation Pipeline
5:34 Document Ingestion Approaches e.g. pdf to markdown - comparing markitdown marker and Gemini
13:44 Chunking Approaches and Trade-offs
22:45 Question-Answer Pair Generation Approaches
31:56 Q-A pair visualization with embeddings or tags AND how to choose a model for synthetic data generation
44:29 How to create an Evaluation Dataset? Best Practice.
54:41 Preview of the upcoming fine-tuning video
Taught by
Trelis Research