Reinforcement Learning for LLMs to Enhance Safety

Learn practical reinforcement learning techniques for aligning large language models to produce safer, more ethical responses in this comprehensive workshop. Explore reinforcement learning from human feedback (RLHF) fundamentals and discover how to implement safety measures that mitigate risks like harmful outputs and bias in AI systems. Begin with a 25-minute introduction to RLHF concepts specifically applied to LLMs, understanding the theoretical foundations and practical applications. Engage in hands-on experience using the UltraFeedback dataset to fine-tune language models through reinforcement learning techniques, spending 35 minutes working directly with open-source tools and public datasets. Participate in collaborative brainstorming sessions focused on developing safety strategies such as creating child-friendly outputs and implementing fairness checks through structured group discussions. Present findings from group work and engage in Q&A sessions to solidify understanding and share insights with fellow participants. Gain practical skills in LLM fine-tuning using RLHF methodologies, develop awareness of safety considerations in AI deployment, and build connections with peers while discussing real-world reinforcement learning alignment challenges. Led by applied research scientists from CIBC with expertise in machine learning, mathematics, quantum computing, and astrophysics, this session positions ethical AI development at the forefront of modern machine learning practices.