Completed
0:00 Qwen 2.5 Onmi - Video, Text and Audio Inputs, Text and Audio Outputs.
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Qwen 2.5 Omni - The Most Multi-modal Model for Video, Text and Audio Processing
Automatically move to the next video in the Classroom when playback concludes
- 1 0:00 Qwen 2.5 Onmi - Video, Text and Audio Inputs, Text and Audio Outputs.
- 2 0:24 Qwen2.5 Architecture, incl. TMRoPE
- 3 6:29 Qwen Omni vs Llama 3.
- 4 7:43 Qwen Omni vs Moshi.
- 5 9:32 Comparison with GPT-4o and Gemini Pro 2.5.
- 6 13:09 How to run Qwen 2.5 Onmi on a GPU?
- 7 18:19 Inference with Audio Inputs and Audio + Text Outputs.
- 8 22:48 Inference with Video Input and Audio Output + Text Output.
- 9 27:22 Qwen 2.5 Model Architecture Print-out
- 10 29:20 When should you use Qwen 2.5 Omni?