Optimus - Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Learn about Optimus, a distributed training system designed to accelerate large-scale multi-modal large language model (MLLM) training through innovative bubble exploitation techniques. Discover how researchers from Harvard University, ByteDance, and University of Southern California address the inefficiencies in current MLLM training systems that suffer from substantial GPU bubbles caused by heterogeneous modality models and complex data dependencies in 3D parallelism. Explore the principled analysis demonstrating how scheduling encoder computation within LLM bubbles can significantly reduce training bottlenecks, and understand the system's approach to searching for separate parallel plans for encoders and LLMs while maintaining original data dependencies. Examine the bubble scheduling algorithm that exploits LLM bubbles without breaking model architecture constraints, and delve into the decomposition of encoder layer computation into optimized kernel series for sub-millisecond bubble scheduling. Review experimental results from production cluster testing showing 20.5%-21.3% acceleration in MLLM training performance using ViT-22B and GPT-175B models across 3072 GPUs compared to baseline systems, demonstrating practical improvements for large-scale multimodal AI training infrastructure.