Google AI Professional Certificate - Learn AI Skills That Get You Hired
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to deploy and serve the Orpheus Text-to-Speech model using vLLM with continuous batching capabilities in this technical tutorial. Set up a demonstration environment using a one-click template from Runpod, then explore running inference on both fine-tuned and default Orpheus models. Discover the technical implementation details of how vLLM integrates with Orpheus, including the process of decoding audio tokens from text input. Compare inference results between different model configurations, including considerations for fp8 precision and fine-tuning quality. Access the accompanying one-click-llms repository to follow along with the practical implementation steps for serving text-to-speech models efficiently.
Syllabus
0:00 Serving Orpheus Text-to-Speech model with continuous batching
0:44 Setup Demo with a one-click template from Runpod
4:12 Running inference on a fine-tuned model poor quality, maybe don’t use fp8, and tune more
5:25 Inference on the default orpheus model, “tara”
7:37 How vLLM works with Orpheus and how to decode audio tokens
12:38 Conclusion and Resources
Taught by
Trelis Research