Overview
Syllabus
0:00 Why vLLM and why it’s so fast
1:22 How vLLM optimizes memory & inference performance
3:29 AWS service quota requirement for GPU instances
4:18 Best AWS instance to use for just getting started
5:03 Ansible + collection prerequisites
6:04 AWS CLI and credential setup
7:11 Creating a Hugging Face access token
7:58 Playbook 1 – aws_helper walkthrough
9:56 Reviewing the generated vars file
9:59 Playbook 2 – vllm_installer deployment
10:40 Instance provisioning & dependency installation
11:45 vLLM server is live
12:03 Testing with curl
Taught by
Red Hat Ansible Automation