Double Inference Speed with AWQ Quantization

Double Inference Speed with AWQ Quantization

Trelis Research via YouTube Direct link

How does GPTQ work?

7 of 8

7 of 8

How does GPTQ work?

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Double Inference Speed with AWQ Quantization

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Increase inference speed and accuracy with AWQ
  2. 2 Deploy a Llama 2 70B server and API with AWQ
  3. 3 How to set up chat-ui for Llama 2
  4. 4 How to run AWQ in Google Colab
  5. 5 How does AWQ quantization work?
  6. 6 How does quantization work for language models?
  7. 7 How does GPTQ work?
  8. 8 Pro tips

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.