Reinforcement Learning from Human Feedback - From Zero to ChatGPT
HuggingFace via YouTube
Master AI and Machine Learning: From Neural Networks to Applications
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Syllabus
Introduction
Recent breakthroughs
What is RL
History of RL
Example of RL
ChatGPT
Technical details
Three conceptual parts
NLP Pretraining
Supervised Finetuning
Reward Model Training
Input and Output Pairs
Reward Model
KL Divergence
Scaling Factor
RL Optimizer
PPO
Conceptual Questions
Prompts and Responses
anthropics
blenderbot
thumbs up and thumbs down
chatGPT example
chatGPT vsanthropic
Open areas of investigation
Wrap up
Q A
Open Source Community
Reinforcement Learning from Email
Paper Release
Taught by
Hugging Face