Deconstructing Text Embedding Models - Understanding Tokenizers and Model Selection
EuroPython Conference via YouTube
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the intricacies of text embedding models in this 44-minute EuroPython Conference talk. Delve into the critical role of tokenizers in model selection, moving beyond reliance on benchmarks like the Massive Text Embedding Benchmark (MTEB). Learn to assess model suitability for specific datasets based on tokenizer performance, and discover strategies for optimizing tokenizers during the fine-tuning process of embedding models. Gain insights into making informed decisions when choosing text embedding models for unique data characteristics.
Syllabus
Deconstructing the text embedding models — Kacper Łukawski
Taught by
EuroPython Conference