What We Learned from Using LLMs in Pinterest Search

Learn how Pinterest successfully integrated Large Language Models into their search infrastructure to enhance relevance scoring and user experience in this 18-minute conference talk. Discover Pinterest's approach to combining search queries with rich multimodal content, including visual captions, link-based text, and user curation signals through LLM integration. Explore their semi-supervised learning framework that enables scaling to large multilingual datasets beyond English and limited human labels. Understand how Pinterest distilled LLM-driven models into efficient architectures for real-time serving while maintaining performance improvements. Examine the practical implementation challenges and solutions, including the use of VLM-generated captions and user actions as content annotations, knowledge distillation techniques for productionizing LLMs, and the development of relevance-tuned LLM embeddings as general-purpose semantic representations. Gain insights from experimental validation and large-scale deployment results that demonstrate substantial improvements in search relevance for Pinterest's global user base, with detailed coverage of the search backend architecture, relevance modeling approaches, and real-world performance metrics.

Syllabus

[00:00] Introduction to Pinterest and its search functionality.
[01:52] Overview of the Pinterest search backend architecture.
[02:29] The search relevance model.
[02:55] Key learnings from using LLMs for search relevance.
[05:04] The value of VLM-generated captions and user actions as content annotations.
[07:16] Productionizing LLMs with knowledge distillation.
[12:14] The utility of relevance-tuned LLM embeddings as general-purpose semantic representations.
[13:55] Q&A session.