This course teaches learners how to chunk large text efficiently and store it in a database for structured retrieval. These techniques are essential for processing long documents in LLM applications such as search, retrieval, and knowledge management.
Overview
Syllabus
- Unit 1: Chunking and Storing Text for Efficient LLM Processing
- Implementing Fixed Length Text Chunking
- Sentence Boundaries for Smarter Chunking
- Chunking Methods Head to Head
- Preserving Document Structure with Paragraph Chunking
- Unit 2: Advanced Chunking Techniques for LLMs
- Exploring Separator Configurations
- Exploring Overlap in Text Chunking
- Token-Based Chunking Implementation
- Unit 3: Converting and Storing Text Chunks in JSONL Format
- Convert Text Chunks to JSONL
- Filter Text Chunks with JSONL
- Text Processing Pipeline with JSONL
- Unit 4: Chunking and Storing Text for Efficient LLM Processing with Chroma DB
- Converting Text Chunks to Vector Embeddings
- Initializing ChromaDB for Vector Storage
- Storing Embeddings in ChromaDB
- Persisting Vector Databases for Production