Learn how to represent text effectively for Retrieval-Augmented Generation (RAG). Explore the importance of text representation, compare Bag-of-Words and embeddings, visualize embeddings with t-SNE, and assess their performance in document retrieval and semantic search.
Overview
Syllabus
- Unit 1: Text Representation with Java: Bag-of-Words model
- Text Preprocessing in Java
- Creating a Vocabulary Dictionary in Java
- Transform Text into Numeric Vectors
- Bag-of-Words Vectorization in Java
- Unit 2: Generating and Comparing Sentence Embeddings with Java
- Generating and Exploring Sentence Embeddings in Java
- Cosine Similarity of Sentence Embeddings in Java
- Finding the Most Similar Sentence Pair Using Cosine Similarity
- Adding and Comparing Sentences with Cosine Similarity in Java
- Sentence Similarity Ranking in Java
- Unit 3: Visualizing Sentence Embeddings with t-SNE in Java
- Visualizing Sentence Embeddings with t-SNE in Java
- Troubleshooting and Refining t-SNE Embeddings in Java
- Adding a New Category to Sentence Embeddings Visualization in Java
- Unit 4: Comparing Bag-of-Words and Embedding-Based Search Techniques in Java
- Bag-of-Words Vectorization in Java
- Incorporating Bigrams into Bag-of-Words Vectorization
- BOW Search Implementation in Java
- Implementing Semantic Search with Cosine Similarity in Java