Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Running Llama 2 with Extended Context Length - Up to 32k Tokens

Trelis Research via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to scale Llama 2 to handle 32k context length in this comprehensive 22-minute tutorial video. Discover techniques for achieving up to 16k tokens on a Colab 40 GB GPU and 32k tokens on an 80 GB A100 using platforms like RunPod, AWS, or Azure. Explore the use of Flash attention, BetterTransformer, and GPTQ quantization to optimize performance. Gain insights on running GPTQ models in Colab, streaming Llama 2 13B with various context lengths, and adjusting parameters like max token output and temperature. Access a free Jupyter notebook for implementation or consider the PRO version for advanced features like conversation saving and document analysis. Delve into theoretical aspects of extending context length, compare different models, and gather valuable tips for working with long context lengths in language models.

Syllabus

How to run Llama 2 with longer context length
Run Llama 2 with 16k context in Google Colab
How to run a GPTQ model in Colab
Run Llama 2 7B with 32k context length using RunPod
Run Llama 2 13B for better performance! 16k context length
Streaming Llama 2 13B on 16k context length
Adjusting max token output and temperature
Streaming Llama 2 13B on 16k context length and 0 temperature
STREAMING LLAMA 2 13B ON 32k CONTEXT LENGTH!
PRO NOTEBOOK - Save Chats and Files. Easily adjust context length.
THEORY BONUS: How to get longer context length?
How does GPTQ work?
How does Flash attention work?
What is the best model for long context length?
What is better Llama 2 or Code-llama or YaRN?
Tips for long context lengths

Taught by

Trelis Research

Reviews

Start your review of Running Llama 2 with Extended Context Length - Up to 32k Tokens

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.