Reinforcement Learning for LLMs to Enhance Safety
MLOps World: Machine Learning in Production via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn practical reinforcement learning techniques for aligning large language models to produce safer, more ethical responses in this comprehensive workshop. Explore reinforcement learning from human feedback (RLHF) fundamentals and discover how to implement safety measures that mitigate risks like harmful outputs and bias in AI systems. Begin with a 25-minute introduction to RLHF concepts specifically applied to LLMs, understanding the theoretical foundations and practical applications. Engage in hands-on experience using the UltraFeedback dataset to fine-tune language models through reinforcement learning techniques, spending 35 minutes working directly with open-source tools and public datasets. Participate in collaborative brainstorming sessions focused on developing safety strategies such as creating child-friendly outputs and implementing fairness checks through structured group discussions. Present findings from group work and engage in Q&A sessions to solidify understanding and share insights with fellow participants. Gain practical skills in LLM fine-tuning using RLHF methodologies, develop awareness of safety considerations in AI deployment, and build connections with peers while discussing real-world reinforcement learning alignment challenges. Led by applied research scientists from CIBC with expertise in machine learning, mathematics, quantum computing, and astrophysics, this session positions ethical AI development at the forefront of modern machine learning practices.
Syllabus
Reinforcement Learning for LLMs to Enhance Safety
Taught by
MLOps World: Machine Learning in Production