35% Off Finance Skills That Get You Hired - Code CFI35
Our career paths help you become job ready faster
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore advanced quantization techniques for efficient Large Language Model (LLM) inference in this technical seminar presented by Assistant Professor Jungwook Choi from Hanyang University. Delve into the evolution of Transformer models, from their 2012 inception to becoming fundamental in Neural Machine Translation and Natural Language Processing. Learn about the Multi-Head Attention mechanism's role in representation learning and how pre-trained language models have scaled to hundreds of billions of parameters. Understand the challenges of deploying massive Transformer models, including computational demands and power consumption, and discover practical solutions through weight and activation quantization techniques specifically designed for edge device implementation. Gain insights into optimizing model efficiency while maintaining performance across applications in computer vision and voice recognition.
Syllabus
tinyML Asia - Jungwook Choi: Quantization Techniques for Efficient Large Language Model Inference
Taught by
EDGE AI FOUNDATION