Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Implementing Large Language Models Inference in Pure C++ - A Llama 2 Case Study

code::dive conference via YouTube

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a comprehensive conference talk that demonstrates how to implement Llama 2 model inference using pure C++, delivered by GPU modeling engineer Filipe Mulonde at code::dive conference. Learn the fundamentals of Llama 2, a state-of-the-art language model, and discover practical techniques for implementing inference solutions without external dependencies. Understand the model's architecture through a streamlined, educational approach inspired by llama.cpp and llama2.c projects, starting with a PyTorch-trained model and progressing to a dependency-free C++ implementation. Gain insights into optimization techniques for fast model inference and practical applications ranging from chatbots to content creation. Benefit from Mulonde's extensive experience as an ARM Holdings engineer, his academic background in Software Engineering and Artificial Intelligence, and his research work at ETH Zurich in computer architecture and bioinformatics. The hour-long presentation draws from his expertise gained through working on autonomous train development and speaking at major C++ conferences like CppCon, Meeting C++, and Embo++.

Syllabus

Filipe Mulode - Implementing Large Language Model LLMs Inference in Pure C++

Taught by

code::dive conference

Reviews

Start your review of Implementing Large Language Models Inference in Pure C++ - A Llama 2 Case Study

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.