How We Fine-tuned Qwen3.0-6B to Beat GPT-4o on Text2SQL

This 51-minute tutorial demonstrates how to fine-tune the Qwen3.0-6B model to outperform GPT-4o on Text2SQL tasks. Learn the complete workflow for improving language model performance on SQL generation, starting with dataset selection and preparation. Follow along as the Oxen team evaluates baseline performance of both models, explores the capabilities of Marimo notebooks for interactive development, and implements fine-tuning techniques that yield superior results. Discover practical strategies for natural language to SQL conversion that can be applied to your own projects. The presentation includes detailed explanations of each step in the process, from initial model evaluation to final performance comparison, with insights into how smaller models can be optimized to compete with much larger ones.

Syllabus

0:00 Introducing Fine-tuning Fridays
1:32 How Fine-tune Fridays Works
2:07 Preview of Qwen3 vs GPT-4o
3:20 The Workflow
4:15 The Task: Natural Language to SQL
7:47 First Step: The Datasets
11:33 Second Step: Eval the Strong Model
21:17 A Quick Peak into Marimo Notebooks
26:12 Questions
28:44 Third Step: Evaluating Qwen/Qwen3-0.6B
37:52 Fourth Step: Fine-Tuning
45:16 Results
48:49 Baseten x Oxen.ai