Model Evaluation for Custom Datasets With Open Models - Multi-Model Comparison With Streamlit

Learn to evaluate open-source embedding models for specific languages and custom datasets through a comprehensive conference talk that addresses the significant performance gaps between multilingual models' English benchmarks and their actual performance in other languages. Discover how to build a Streamlit-based evaluation platform for comparing different types of embedding models, including language-specific models like the Japanese Ruri series, multilingual alternatives such as multilingual-E5 and BGE-M3, and general-purpose models across real-world tasks including semantic search. Explore practical methodologies for assessing model performance on custom data that better reflects actual use cases rather than relying solely on standard benchmarks. Gain insights into creating evaluation frameworks that can be adapted for any language or cultural context using entirely open-source tools, with all code and methodologies made available as open source for global developer community use.