Foundations for Product Management Success
AI Adoption - Drive Business Value and Organizational Impact
Overview
Syllabus
Lecture 1 - Introduction
Lecture 2 - Transformers Introduction
Lecture 3 - CLIP
Lecture 4 - Visual-Language Models Introduction Part-I: CoCA, PALI
Lecture 5 - Visual-Language Models Introduction Part-II: FLAMINGO, FLAVA, PAINTER, BLIP-2
Lecture 6 - Visual-Language Models Introduction Part-III: Image-Bind, Language-Bind, LLaVA
Lecture 7 - Visual-Language Models Introduction Part-IV: Video ChatGPT, PG-Video LLaVA
Lecture 8 - FILIP: Fine-grained Interactive Language-Image Pre-Training
Lecture 9 - HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Lecture 10-BLIP:Bootstrapping Language-Image Pretraining for Unified VL Understanding and Generation
Lecture 11 - BLIP-2 : Bootstrapping Language-Image Pre-training with Frozen Image Encoders and LLMs
Lecture 12 - MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Lecture 13 - MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound
Lecture 14 - Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Lecture 15 - Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Lecture 16 - PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Lecture 17 - Evaluating Object Hallucination in Large Vision-Language Models
Lecture 18 - Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Lecture 19 - CM3Leon: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Lecture 20 - OWLv2: Scaling Open-Vocabulary Object Detection
Lecture 21 - Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Lecture 22 - FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Taught by
UCF CRCV