Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CodeSignal

Chunking and Storing Text for Efficient LLM Processing

via CodeSignal

Overview

This course teaches learners how to chunk large text efficiently and store it in a database for structured retrieval. These techniques are essential for processing long documents in LLM applications such as search, retrieval, and knowledge management.

Syllabus

  • Unit 1: Chunking and Storing Text for Efficient LLM Processing
    • Implementing Fixed Length Text Chunking
    • Sentence Boundaries for Smarter Chunking
    • Chunking Methods Head to Head
    • Preserving Document Structure with Paragraph Chunking
  • Unit 2: Advanced Chunking Techniques for LLMs
    • Exploring Separator Configurations
    • Exploring Overlap in Text Chunking
    • Token-Based Chunking Implementation
  • Unit 3: Converting and Storing Text Chunks in JSONL Format
    • Convert Text Chunks to JSONL
    • Filter Text Chunks with JSONL
    • Text Processing Pipeline with JSONL
  • Unit 4: Chunking and Storing Text for Efficient LLM Processing with Chroma DB
    • Converting Text Chunks to Vector Embeddings
    • Initializing ChromaDB for Vector Storage
    • Storing Embeddings in ChromaDB
    • Persisting Vector Databases for Production

Reviews

Start your review of Chunking and Storing Text for Efficient LLM Processing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.