RAG Knowledge System

Retrieval-augmented generation system with semantic search, source attribution, and a REST API for document-grounded Q&A.

Semantic search with source attribution

Local LLM inference via Ollama

REST API with multi-turn conversation support

Foundation for production RAG implementations

A retrieval-augmented generation (RAG) system that enables semantic question-answering over document collections with full source attribution. Built as a learning project that evolved into a reusable architecture I've since applied in production systems.

The system ingests PDF documents, chunks them intelligently, generates embeddings using HuggingFace sentence transformers (BAAI/bge-small-en-v1.5), and stores them in ChromaDB for efficient vector similarity search. When a user asks a question through the REST API, the system retrieves the most relevant document chunks, augments the LLM prompt with this context, and generates an answer that cites its sources.

The API is built with Flask and supports conversation management, allowing multi-turn interactions where context accumulates across questions.

The LLM inference runs locally through Ollama, keeping everything private and cost-free. The system supports multiple languages in both documents and queries.

This project was foundational for understanding RAG patterns that I later implemented at production scale in the B2B platform — including chunking strategies, embedding model selection, retrieval tuning, and the critical challenge of reducing hallucination in grounded generation.

Need something similar?

Let's talk about what I can build for your business.

Get in touch