Optimizing RLHF Training for Large Language Models with Stage Fusion

Learn about RLHFuse, an innovative training system that optimizes Reinforcement Learning from Human Feedback (RLHF) for large language models through stage fusion techniques in this 13-minute conference presentation from NSDI '25. Discover how researchers from Peking University and StepFun address the critical challenges of low GPU utilization in existing RLHF systems caused by data skewness in generation stages and pipeline bubbles in training stages. Explore the system's revolutionary approach that breaks away from traditional RLHF workflows by splitting tasks into finer-grained subtasks and implementing stage fusion to dramatically improve performance. Understand the two key innovations: inter-stage fusion that overlaps generation and inference stages through sample-level subtasks to eliminate bottlenecks from long-tailed samples, and intra-stage fusion that concurrently executes micro-batch subtasks with a fused pipeline schedule to reduce pipeline bubbles. Examine experimental results demonstrating up to 3.7× improvement in training throughput compared to existing systems, making this essential viewing for researchers and practitioners working on large language model training optimization and distributed machine learning systems.