Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about an innovative approach to Large Language Model training in this 24-minute technical presentation that introduces ORPO (Odds Ratio Preference Optimization), a groundbreaking "reference model-free" monolithic optimization algorithm. Explore the theoretical physics perspective behind this new preference-aligned Supervised Fine-Tuning (SFT) method, examining parallels between regularization terms methodologies and Lagrange Multipliers. Delve into how ORPO eliminates the need for a separate preference alignment phase while comparing its performance metrics against LLama 2 and Mistral 7B models. Based on research from the paper "ORPO: Monolithic Preference Optimization without Reference Model," gain insights into this streamlined approach that combines preference alignment directly into the training process.
Syllabus
ORPO: NEW DPO Alignment and SFT Method for LLM
Taught by
Discover AI