Running Llama 2 with Extended Context Length - Up to 32k Tokens

Running Llama 2 with Extended Context Length - Up to 32k Tokens

Trelis Research via YouTube Direct link

How to run Llama 2 with longer context length

1 of 16

1 of 16

How to run Llama 2 with longer context length

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Running Llama 2 with Extended Context Length - Up to 32k Tokens

Automatically move to the next video in the Classroom when playback concludes

  1. 1 How to run Llama 2 with longer context length
  2. 2 Run Llama 2 with 16k context in Google Colab
  3. 3 How to run a GPTQ model in Colab
  4. 4 Run Llama 2 7B with 32k context length using RunPod
  5. 5 Run Llama 2 13B for better performance! 16k context length
  6. 6 Streaming Llama 2 13B on 16k context length
  7. 7 Adjusting max token output and temperature
  8. 8 Streaming Llama 2 13B on 16k context length and 0 temperature
  9. 9 STREAMING LLAMA 2 13B ON 32k CONTEXT LENGTH!
  10. 10 PRO NOTEBOOK - Save Chats and Files. Easily adjust context length.
  11. 11 THEORY BONUS: How to get longer context length?
  12. 12 How does GPTQ work?
  13. 13 How does Flash attention work?
  14. 14 What is the best model for long context length?
  15. 15 What is better Llama 2 or Code-llama or YaRN?
  16. 16 Tips for long context lengths

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.