(100% Free, Private (No Internet), and Local PC Installation)
🔥 DeepSeek + NOMIC + FAISS + Neural Reranking + HyDE + GraphRAG + Chat Memory = The Ultimate RAG Stack!
This chatbot enables fast, accurate, and explainable retrieval of information from PDFs, DOCX, and TXT files using DeepSeek-7B, BM25, FAISS, Neural Reranking (Cross-Encoder), GraphRAG, and Chat History Integration.
- GraphRAG Integration: Builds a Knowledge Graph from your documents for more contextual and relational understanding.
- Chat Memory History Awareness: Maintains context by referencing chat history, enabling more coherent and contextually relevant responses.
- Improved Error Handling: Resolved issues related to chat history clearing and other minor bugs for a smoother user experience.
You can install and run the DeepSeek RAG Chatbot in one of two ways:
- Traditional (Python/venv) Installation
- Docker Installation (ideal for containerized deployments)
git clone https://github.com/SaiAkhil066/DeepSeek-RAG-Chatbot.git
cd DeepSeek-RAG-Chatbot
# Create a virtual environment
python -m venv venv
# Activate your environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Upgrade pip (optional, but recommended)
pip install --upgrade pip
# Install project dependencies
pip install -r requirements.txt
- Download Ollama → https://ollama.com/
- Pull the required models:
Note: If you want to use a different model, update
ollama pull deepseek-r1:7b ollama pull nomic-embed-text
MODEL
orEMBEDDINGS_MODEL
in your environment variables or.env
file accordingly.
- Make sure Ollama is running on your system:
ollama serve
- Launch the Streamlit app:
streamlit run app.py
- Open your browser at http://localhost:8501 to access the chatbot UI.
If Ollama is already installed on your host machine and listening at localhost:11434
, do the following:
- Build & Run:
docker-compose build docker-compose up
- The app is now served at http://localhost:8501. Ollama runs on your host, and the container accesses it via the specified URL.
If you prefer everything in Docker:
version: "3.8"
services:
ollama:
image: ghcr.io/jmorganca/ollama:latest
container_name: ollama
ports:
- "11434:11434"
deepgraph-rag-service:
container_name: deepgraph-rag-service
build: .
ports:
- "8501:8501"
environment:
- OLLAMA_API_URL=http://ollama:11434
- MODEL=deepseek-r1:7b
- EMBEDDINGS_MODEL=nomic-embed-text:latest
- CROSS_ENCODER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
depends_on:
- ollama
Then:
docker-compose build
docker-compose up
Both Ollama and the chatbot run in Docker. Access the chatbot at http://localhost:8501.
- Upload Documents: Add PDFs, DOCX, or TXT files via the sidebar.
- Hybrid Retrieval: Combines BM25 and FAISS to fetch the most relevant text chunks.
- GraphRAG Processing: Builds a Knowledge Graph from your documents to understand relationships and context.
- Neural Reranking: Uses a Cross-Encoder model for reordering the retrieved chunks by relevance.
- Query Expansion (HyDE): Generates hypothetical answers to expand your query for better recall.
- Chat Memory History Integration: Maintains context by referencing previous user messages.
- DeepSeek-7B Generation: Produces the final answer based on top-ranked chunks.
Feature | Previous Version | New Version |
---|---|---|
Retrieval Method | Hybrid (BM25 + FAISS) | Hybrid + GraphRAG |
Contextual Understanding | Limited | Enhanced with Knowledge Graphs |
User Interface | Standard | Customizable + Themed Sidebar |
Chat History | Not Utilized | Full Memory Integration |
Error Handling | Basic | Improved with Bug Fixes |
- Fork this repo, submit pull requests, or open issues for new features or bug fixes.
- We love hearing community suggestions on how to extend or improve the chatbot.
Got feedback or suggestions? Let’s discuss on Reddit! 🚀💡
Enjoy building knowledge graphs, maintaining conversation memory, and harnessing powerful local LLM inference—all from your own machine.
The future of retrieval-augmented AI is here—no internet required!