Tricks to Fine Tuning - Advanced Techniques for Model Training Without Labels

This podcast episode features Prithviraj Ammanabrolu, Research Scientist at Databricks, discussing innovative fine-tuning techniques for language models. Dive into the concept of Tao fine-tuning, a revolutionary approach that eliminates the need for labeled data by using reinforcement learning and synthetic data to help models evaluate and improve themselves. Learn how this technique enables smaller models to perform significantly better and potentially transform efficient model deployment. Ammanabrolu, who also serves as an Assistant Professor at UC San Diego leading the PEARLS Lab, shares insights on reward model fine-tuning, the balance between training and inference compute, strategies for handling model drift, and the differences between prompt tuning and traditional fine-tuning. The 54-minute discussion covers optimization strategies for small models, their untapped potential, differences in fine-tuning various model architectures, and the implications of open model frameworks like Mistral.

Syllabus

[00:00] Raj's preferred coffee
[00:36] Takeaways
[01:02] Tao Naming Decision
[04:19] No Labels Machine Learning
[08:09] Tao and TAO breakdown
[13:20] Reward Model Fine-Tuning
[18:15] Training vs Inference Compute
[22:32] Retraining and Model Drift
[29:06] Prompt Tuning vs Fine-Tuning
[34:32] Small Model Optimization Strategies
[37:10] Small Model Potential
[43:08] Fine-tuning Model Differences
[46:02] Mistral Model Freedom
[53:46] Wrap up