RAG Knowledge System
Retrieval-augmented generation system with semantic search, source attribution, and a REST API for document-grounded Q&A.
Semantic search with source attribution
Local LLM inference via Ollama
REST API with multi-turn conversation support
Foundation for production RAG implementations
A retrieval-augmented generation (RAG) system that enables semantic question-answering over document collections with full source attribution. Built as a learning project that evolved into a reusable architecture I've since applied in production systems.
The system ingests PDF documents, chunks them intelligently, generates embeddings using HuggingFace sentence transformers (BAAI/bge-small-en-v1.5), and stores them in ChromaDB for efficient vector similarity search. When a user asks a question through the REST API, the system retrieves the most relevant document chunks, augments the LLM prompt with this context, and generates an answer that cites its sources.
The API is built with Flask and supports conversation management, allowing multi-turn interactions where context accumulates across questions.
The LLM inference runs locally through Ollama, keeping everything private and cost-free. The system supports multiple languages in both documents and queries.
This project was foundational for understanding RAG patterns that I later implemented at production scale in the B2B platform — including chunking strategies, embedding model selection, retrieval tuning, and the critical challenge of reducing hallucination in grounded generation.