Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Serve a Text to Speech Model with vLLM

Trelis Research via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to deploy and serve the Orpheus Text-to-Speech model using vLLM with continuous batching capabilities in this technical tutorial. Set up a demonstration environment using a one-click template from Runpod, then explore running inference on both fine-tuned and default Orpheus models. Discover the technical implementation details of how vLLM integrates with Orpheus, including the process of decoding audio tokens from text input. Compare inference results between different model configurations, including considerations for fp8 precision and fine-tuning quality. Access the accompanying one-click-llms repository to follow along with the practical implementation steps for serving text-to-speech models efficiently.

Syllabus

0:00 Serving Orpheus Text-to-Speech model with continuous batching
0:44 Setup Demo with a one-click template from Runpod
4:12 Running inference on a fine-tuned model poor quality, maybe don’t use fp8, and tune more
5:25 Inference on the default orpheus model, “tara”
7:37 How vLLM works with Orpheus and how to decode audio tokens
12:38 Conclusion and Resources

Taught by

Trelis Research

Reviews

Start your review of Serve a Text to Speech Model with vLLM

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.