Implementing Large Language Models Inference in Pure C++ - A Llama 2 Case Study
code::dive conference via YouTube
Earn Your Business Degree, Tuition-Free, 100% Online!
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore a comprehensive conference talk that demonstrates how to implement Llama 2 model inference using pure C++, delivered by GPU modeling engineer Filipe Mulonde at code::dive conference. Learn the fundamentals of Llama 2, a state-of-the-art language model, and discover practical techniques for implementing inference solutions without external dependencies. Understand the model's architecture through a streamlined, educational approach inspired by llama.cpp and llama2.c projects, starting with a PyTorch-trained model and progressing to a dependency-free C++ implementation. Gain insights into optimization techniques for fast model inference and practical applications ranging from chatbots to content creation. Benefit from Mulonde's extensive experience as an ARM Holdings engineer, his academic background in Software Engineering and Artificial Intelligence, and his research work at ETH Zurich in computer architecture and bioinformatics. The hour-long presentation draws from his expertise gained through working on autonomous train development and speaking at major C++ conferences like CppCon, Meeting C++, and Embo++.
Syllabus
Filipe Mulode - Implementing Large Language Model LLMs Inference in Pure C++
Taught by
code::dive conference