Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Advanced Data Prep and Visualization Techniques for Fine-tuning LLMs

Trelis Research via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This video tutorial explores advanced techniques for preparing and visualizing data when fine-tuning Large Language Models (LLMs). Learn the complete synthetic data generation pipeline, from setting clear goals to implementing document ingestion approaches with tools like markitdown marker and Gemini. Discover various chunking strategies and their trade-offs, followed by effective question-answer pair generation methods. Explore visualization techniques using embeddings or tags to improve dataset quality, and get guidance on selecting the right model for synthetic data generation. Master the best practices for creating evaluation datasets to properly measure your fine-tuned model's performance. The tutorial concludes with a preview of upcoming fine-tuning content, providing a comprehensive foundation for advanced LLM customization work.

Syllabus

0:00 Advanced Data Preparation Techniques
0:33 Video Overview
1:52 Synthetic Dataset Generation Goals
3:48 Synthetic Data Generation Pipeline
5:34 Document Ingestion Approaches e.g. pdf to markdown - comparing markitdown marker and Gemini
13:44 Chunking Approaches and Trade-offs
22:45 Question-Answer Pair Generation Approaches
31:56 Q-A pair visualization with embeddings or tags AND how to choose a model for synthetic data generation
44:29 How to create an Evaluation Dataset? Best Practice.
54:41 Preview of the upcoming fine-tuning video

Taught by

Trelis Research

Reviews

Start your review of Advanced Data Prep and Visualization Techniques for Fine-tuning LLMs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.