Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

NeoBabel - A Multilingual Open Foundation Model for Visual Generation

SAIConference via YouTube

Start learning Write review

Discover the future of multilingual text-to-image generation in this 40-minute keynote presentation by Prof. Cees G.M. Snoek from the University of Amsterdam at the Intelligent Systems Conference 2026. Learn about NeoBabel, a revolutionary multilingual image generation foundation model that challenges English-centric AI systems and promotes global digital inclusivity by natively supporting English, Chinese, Dutch, French, Hindi, and Persian languages. Explore the innovative design and development process behind the first natively multilingual text-to-image generation model, including its progressive training strategy that efficiently leverages limited data and compute resources. Examine the creation of 124 million captioned images across multiple languages, combining both real-world and synthetic data sources, and understand the unified multimodal architecture that directly connects language to image pixels. Delve into rigorous benchmarking methodologies and evaluation techniques, including multilingual adaptations of Geneval and DPG metrics used to assess model performance. Investigate practical applications of NeoBabel technology, including multilingual inpainting capabilities, cross-cultural collaboration tools, and code-switching functionalities that enable seamless language transitions. Understand the open and inclusive research approach that makes all data, models, and tools available for replication and improvement by the global research community. Gain insights from Prof. Snoek's expertise as a leading researcher in AI, computer vision, and multilingual machine learning, who heads multiple labs including the Video & Image Sense Lab and Human-Aligned Video AI Lab at the University of Amsterdam, and serves as Chief Scientific Officer at Kepler Vision Technologies.