Training Video Generation with Wan 2.2 - Conan O'Brien and Will Smith Character Consistency

Learn to generate training videos with character consistency using the Wan 2.2 model through a practical demonstration featuring Conan O'Brien and Will Smith personas. Explore the Wan suite of models, starting with an overview of Wan 2.1's architecture and research foundations before diving into the key video improvements introduced in Wan 2.2. Master the fine-tuning process by following a complete workflow that creates a scenario of Conan O'Brien interviewing Will Smith wearing a Denver Broncos shirt. Examine base model results and understand Wan 2.2's enhanced architecture for better video generation capabilities. Discover data creation techniques for fine-tuning and learn the specific methods used to fine-tune each Wan model variant. Address common questions about training requirements, including optimal image quantities, musubi-tuner usage, and camera panning techniques. Compare image quality improvements throughout the fine-tuning process and gain hands-on experience implementing fine-tuned models in ComfyUI. Configure ComfyUI to run custom fine-tuned models, understand image input format considerations, and learn how LoRAs integrate into the generation pipeline. Conclude with a complete demonstration showing the final results of character-consistent video generation featuring the trained Conan O'Brien and Will Smith models.

Syllabus

0:00 Is it Wan like “Anne” or “won”?
0:55 The Wan suite of models
1:10 Wan 2.1’s model architecture and research paper
3:50 Wan 2.2 video improvements from Wan 2.1
5:35 Our fine-tuning goal: Conan O’Brien interviewing Will Smith who’s wearing a Denver Broncos shirt
7:30 Base model results
8:55 Wan 2.2’s model architecture
12:55 Fine-tuning: How we created our data
17:12 Fine-tuning: How we fine-tuned each Wan model
19:22 Question: How many images do you need?
20:24 Question: Did we use musubi-tuner?
20:40 Question: How to train camera panning
22:45 Fine-tuning: Comparing images as we fine-tune
29:37 Bringing our Will Smith fine-tuned model to Comfyui
42:00 Configuring Comfyui to run our fine-tuned model
47:28 Question: Does the image input format matter?
48:40 Loading our Conan O’Brien fine-tuned model on Comfyui
57:45 Question: How are the LoRAs loaded into the pipeline
58:40 Final Results: Conan interviewing Will Smith