A Streamlit application that enables users to ask questions about PDF documents using LangChain and Ollama LLMs. The application leverages PDF text extraction, vector embeddings, and retrieval-based question answering to provide accurate responses based on document content.
- PDF document loading and processing
- Text chunking with overlap for better context preservation
- Vector embeddings using Ollama
- Fast similarity search using FAISS vector database
- Interactive Q&A interface using Streamlit
- Retrieval-augmented generation for accurate answers
- Python 3.8+
- Ollama installed and running locally
- Clone the repository:
git clone <repository-url>
cd <repository-name>
- Install required dependencies:
pip install langchain-community langchain streamlit faiss-cpu
- Make sure Ollama is installed and running with the llama3 model:
ollama pull llama3
├── data/
│ └── IJEPA.pdf # Your PDF document
├── app.py # Main application code
└── README.md # This file
Place your PDF file in the
directory -
Run the Streamlit application:
streamlit run app.py
- Enter your questions in the search box to get answers based on the PDF content
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = PyPDFLoader("data/IJEPA.pdf")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
db = FAISS.from_documents(documents[:30], OllamaEmbeddings(model="llama3"))
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.llms import Ollama
prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context.
Think step by step before providing a detailed answer.
I will tip you $1000 if the user finds the answer helpful.
llm = Ollama(model="llama3")
document_chain = create_stuff_documents_chain(llm, prompt)
from langchain.chains import create_retrieval_chain
retriever = db.as_retriever()
retrievar_chain = create_retrieval_chain(retriever, document_chain)
prompt = st.text_input("Do Search here")
if prompt:
respose = retrievar_chain.invoke({"input":prompt})
The application can be customized by modifying these parameters:
: Size of text chunks (default: 1000)chunk_overlap
: Overlap between chunks (default: 200)documents[:30]
: Number of documents to process (adjust as needed)
- Currently processes only the first 30 documents from the PDF
- Requires Ollama to be running locally
- Performance depends on the size of the PDF and available system resources
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.