Serve a Custom LLM for Over 100 Customers - GPU Selection, Quantization, and API Setup

Serve a Custom LLM for Over 100 Customers - GPU Selection, Quantization, and API Setup

Trelis Research via YouTube Direct link

Serving a model for 100 customers

1 of 12

1 of 12

Serving a model for 100 customers

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Serve a Custom LLM for Over 100 Customers - GPU Selection, Quantization, and API Setup

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Serving a model for 100 customers
  2. 2 Video Overview
  3. 3 Choosing a server
  4. 4 Choosing software to serve an API
  5. 5 One-click templates
  6. 6 Tips on GPU selection.
  7. 7 Using quantisation to fit in a cheaper GPU
  8. 8 Vast.ai setup
  9. 9 Serve Mistral with vLLM and AWQ, incl. concurrent requests
  10. 10 Serving a function calling model
  11. 11 API speed tests, including concurrent
  12. 12 Video Recap

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.