Learn key methods for representing text in RAG systems. Explore why text representation matters, implement a Bag-of-Words model, understand how embeddings capture deeper meaning, visualize embeddings with t-SNE, and compare BOW and embeddings in document retrieval and semantic search.
Overview
Syllabus
- Unit 1: Introduction to Text Representation: Bag-of-Words Model
- Start with a Clean-Up
- Building a Vocabulary Dictionary
- Transform Text into Numeric Vectors
- Building a Bag-of-Words Text Processor
- Unit 2: Generating and Comparing Sentence Embeddings
- Creating Sentence Embeddings
- Comparing Sentence Embeddings
- Finding Most Similar Sentence Pairs
- Exploring Sentence Similarity Changes
- Ranking Sentences by Similarity
- Unit 3: Visualizing Sentence Embeddings with t-SNE
- Visualizing Sentence Embeddings
- Optimizing t-SNE Parameters
- Adding a New Category
- Unit 4: Comparing Bag-of-Words and Embeddings-Based Semantic Search
- Building a Bag of Words
- Enhance Bag-of-Words with Bigrams
- Bag of Words Search Task
- Semantic Search with Embeddings