Overview
Syllabus
0:00 The Task: Generating Conan O’Brien interviewing Will Smith
2:04 Base Model Results and Early Fine-Tunes
3:13 The Problem: Video models aren’t good a multi person generations
6:50 Can we just prompt Nano Banana instead of fine-tuning
9:43 Why fine-tune?
11:17 What could a higher quality production pipeline look like?
14:50 Step 1: Masking
16:04 Enter DinoV3
21:28 Fine-tuning Qwen-Image-Edit to fill in masked images
26:12 Implementing our Wan 2.2 Comfyui Workflow
28:13 Questions
31:40 Tweaking our Comfyui flow
36:05 Moment of truth! Final generation
36:54 Question
38:15 Implementing our Qwen-Image-Edit LoRA in Comfyui
43:24 Conclusion
Taught by
Oxen