Overview

AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off

One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.

Build your own AI assistant that answers questions from your documents – entirely on your local machine. Assuming a basic acquaintance with Python, this course will teach you how to run a local LLM, turn PDFs into searchable chunks, generate embeddings, store them in a vector database, and connect retrieval and generation into a complete RAG (Retrieval-Augmented Generation) pipeline. You’ll create OpenAI-compatible and RAG endpoints with FastAPI, work with Ollama and Qdrant, and finish by building a browser-based interface for asking questions and reviewing sources. This course stands out because everything is built locally, from end to end. Instead of relying on black-box cloud services, you will master every step of the system you build – from document processing and vector search to prompt construction and answer generation. You’ll learn by actually building, adding one piece at a time. With each module, you’ll unlock a new feature in your project., By the end, you will have a production-ready AI project you can run, customize, and share.

Syllabus

Week 1

In this module, you’ll set up your local environment and make your first call to a large language model. You’ll learn what LLMs are, why local inference is crucial for privacy and experimentation, and how Docker helps you run the system reliably across different machines. You’ll explore the core architecture of the course project, understand the API layer and the role of HTTP and JSON, and write your first Python function that talks to the model. By the end of this module, you’ll have a working local LLM request flowing through your own API.

Week 2

In this module, you’ll build and configure the API layer that powers the project. You’ll learn how FastAPI structures endpoints, request models, and validation, how the server communicates with Ollama, and why the OpenAI chat format has become the industry standard. You’ll also work with async requests, system prompts, multi-turn conversations, and automated testing using pytest and FastAPI’s TestClient. By the end of this module, you’ll understand how the chat API works from request to response and how to verify it with tests.

Week 3

In this module, you’ll move from model calls to document processing. You’ll extract text from PDFs, see why PDF parsing is harder than it looks, and learn how to split long documents into retrieval-friendly chunks. You’ll compare chunk sizes, master overlap strategies, and build a document pipeline that attaches critical metadata like source file names and chunk positions. By the end of this module, you’ll be able to turn any PDF into structured chunks that are ready for indexing.

Week 4

In this module, you’ll make text searchable by meaning. You’ll learn what embeddings are, how cosine similarity measures semantic closeness, and why vector databases differ from traditional databases. Then, you’ll connect these ideas in code by generating vectors with an embedding model, storing them in Qdrant, and implementing the indexing endpoint that ties document chunks to vector storage. By the end of this module, your documents will be embedded, stored, and ready for retrieval.

Week 5

In this module, you’ll close the RAG loop. You’ll explore the two core phases of RAG – retrieval and generation – and see how a user’s question becomes a vector, how relevant chunks are selected, and how prompt structure guides the model to answer strictly from context. Then, you’ll implement the query endpoint, tune parameters like Top-K and score thresholds, and return answers with clear sources. By the end of this module, your system will answer questions accurately, grounded entirely in your indexed documents.

Week 6

In this module, you’ll give your project a browser interface and make it demo-ready. You’ll learn how to add a lightweight web UI with HTMX and FastAPI, render templates and HTML fragments, and connect the form-based interface to the same RAG logic you built earlier. You’ll also see how to test the full user flow in the browser and turn your backend project into something easy to share with others. By the end of this module, you’ll have a complete local AI assistant with a usable web interface.