Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the development and evolution of large language model evaluation through Weights & Biases' comprehensive Nejumi LLM Leaderboard in this 20-minute conference talk. Discover how W&B has conducted systematic performance evaluations of LLMs since 2023, continuously publishing results that have become Japan's largest evaluation platform and a key reference for researchers and companies. Learn about the iterative development process from the initial version through the latest version 4, understanding how the leaderboard has adapted to advancements in evaluation techniques and model design. Gain insights from actual operational experience and explore future prospects for LLM evaluation methodologies and benchmarking standards in the rapidly evolving field of artificial intelligence.
Syllabus
The evolution of LLM evaluation and Japan’s cutting-edge benchmarks on the Nejumi leaderboard
Taught by
Weights & Biases