AI Engineer - Learn how to integrate AI into software applications
AI Product Expert Certification - Master Generative AI Skills
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the development and evolution of large language model evaluation through Weights & Biases' comprehensive Nejumi LLM Leaderboard in this 20-minute conference talk. Discover how W&B has conducted systematic performance evaluations of LLMs since 2023, continuously publishing results that have become Japan's largest evaluation platform and a key reference for researchers and companies. Learn about the iterative development process from the initial version through the latest version 4, understanding how the leaderboard has adapted to advancements in evaluation techniques and model design. Gain insights from actual operational experience and explore future prospects for LLM evaluation methodologies and benchmarking standards in the rapidly evolving field of artificial intelligence.
Syllabus
The evolution of LLM evaluation and Japan’s cutting-edge benchmarks on the Nejumi leaderboard
Taught by
Weights & Biases