Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Replacing Multiple Actors with Fine-Tuned Qwen and Wan 2.2

Oxen via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to overcome multi-person video generation challenges by fine-tuning Qwen and Wan 2.2 models in this 45-minute technical tutorial. Explore the specific task of generating realistic video content featuring multiple actors, such as Conan O'Brien interviewing Will Smith, and discover why base models struggle with multi-person scenarios. Master advanced techniques including image masking with DinoV3, fine-tuning Qwen-Image-Edit for masked image completion, and implementing custom workflows in ComfyUI. Understand the advantages of fine-tuning over simple prompting approaches and develop a higher-quality production pipeline for multi-actor video generation. Follow along as the instructor demonstrates practical implementation steps, troubleshoots workflow adjustments, and reveals the final generation results while explaining how to integrate Qwen-Image-Edit LoRA into ComfyUI workflows.

Syllabus

0:00 The Task: Generating Conan O’Brien interviewing Will Smith
2:04 Base Model Results and Early Fine-Tunes
3:13 The Problem: Video models aren’t good a multi person generations
6:50 Can we just prompt Nano Banana instead of fine-tuning
9:43 Why fine-tune?
11:17 What could a higher quality production pipeline look like?
14:50 Step 1: Masking
16:04 Enter DinoV3
21:28 Fine-tuning Qwen-Image-Edit to fill in masked images
26:12 Implementing our Wan 2.2 Comfyui Workflow
28:13 Questions
31:40 Tweaking our Comfyui flow
36:05 Moment of truth! Final generation
36:54 Question
38:15 Implementing our Qwen-Image-Edit LoRA in Comfyui
43:24 Conclusion

Taught by

Oxen

Reviews

Start your review of Replacing Multiple Actors with Fine-Tuned Qwen and Wan 2.2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.