Explore cutting-edge research in computer vision through this advanced graduate-level course that examines the intersection of visual and language models, diffusion models, and AI safety. Delve into foundational concepts starting with CLIP and visual-language model architectures, progressing through comprehensive coverage of diffusion models for image and video generation. Investigate critical safety and bias issues in modern AI systems through analysis of recent research papers addressing object hallucinations in large vision-language models, bias detection in text-to-image generators, and safety evaluation frameworks for multimodal systems. Study methods for mitigating harmful content generation, including techniques for removing NSFW concepts from vision-language models and training-free safety guards for generative systems. Examine privacy concerns through research on private attribute inference from images and explore counterfactual approaches to addressing intersectional social bias. Learn about advanced decoding techniques for reducing hallucinations and discover approaches for selectively forgetting problematic concepts in diffusion models. Gain hands-on experience with state-of-the-art safety benchmarks and evaluation suites designed for multimodal large language models, preparing you to address the complex challenges of deploying vision-AI systems responsibly in real-world applications.

Syllabus

Paper 10: Open-set Bias Detection in Text-to-Image Generative Models
Paper 11: Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Paper 12: Private Attribute Inference from Images with Vision-Language Models
Paper 8: SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Paper 9: Learning to Forget in Text-to-Image Diffusion Models
Paper 7: SocialCounterfactuals: Probing & Mitigating Intersectional Social Bias with Counterfactuals
Paper 6 - CAN WE TALK MODELS INTO SEEING THE WORLD DIFFERENTLY?
Paper 5: MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Paper 3: MLLMGUARD:A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Paper 2 : HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Paper 1: Mitigating Object Hallucinations in LVLMs through Visual Contrastive Decoding
Lecture 12 - Diffusion Models - Part III
Lecture 11 - Diffusion Models - Part II
Lecture 10 - Diffusion Models - Introduction Part-I
Lecture-8-Visual-Language Models Introduction Part-IV
Lecture7 - Visual-Language Models IntroductionPart-III
Lecture6 - Visual-Language Models IntroductionPart-II
Lecture5 - Visual-Language Models IntroductionPart-I
Lecture 4 - CLIP
Lecture 2 - Introduction - Part II
Lecture 1 - Introduction - Part I