Scaling Down, Powering Up - Can Efficient Training Beat Scaling Laws?

Explore cutting-edge strategies for training efficient language models that challenge traditional scaling paradigms in this comprehensive conference presentation. Discover how innovative data-centric and model-centric approaches can achieve superior AI performance without massive computational costs, using DeepSeek's success as a prime example of thoughtful engineering over brute-force scaling. Learn about the rise of small language models (SLMs) as cost-effective alternatives to dense large language models, and master data enhancement techniques including mixing, filtering, and deduplication to improve dataset quality. Dive into advanced model optimization methods such as pruning, distillation, parameter-efficient fine-tuning, quantization, and model merging that streamline architectures while maintaining performance. Understand how strategic data preparation and intelligent model design can produce superior language models without the prohibitive financial investments traditionally associated with scaling AI systems, demonstrating that efficiency and thoughtful engineering can outperform raw computational power in modern machine learning applications.