AI Engineer - Learn how to integrate AI into software applications
Lead AI-Native Products with Microsoft's Agentic AI Program
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the development and evolution of large language model evaluation through Weights & Biases' comprehensive Nejumi LLM Leaderboard in this 20-minute conference talk. Discover how W&B has conducted systematic performance evaluations of LLMs since 2023, continuously publishing results that have become Japan's largest evaluation platform and a key reference for researchers and companies. Learn about the iterative development process from the initial version through the latest version 4, understanding how the leaderboard has adapted to advancements in evaluation techniques and model design. Gain insights from actual operational experience and explore future prospects for LLM evaluation methodologies and benchmarking standards in the rapidly evolving field of artificial intelligence.
Syllabus
The evolution of LLM evaluation and Japan’s cutting-edge benchmarks on the Nejumi leaderboard
Taught by
Weights & Biases