Skip to content

Gen AI using LangChain is designed to simplify and enhance the development of applications that use Large Language Models (LLMs) such as OpenAI, Hugging Face, Groq, or DeepSeek.

Notifications You must be signed in to change notification settings

DishenMakwana/langchain-gen-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Example Use Case: Building a Q&A system for a PDF research paper.


Ingestion: Preparing the Document

  1. Loading the Document
    Load a PDF research paper into the system.

    from langchain.document_loaders import PyPDFLoader
    
    # Load the PDF
    loader = PyPDFLoader("research_paper.pdf")
    documents = loader.load()
    print(f"Loaded {len(documents)} pages from the document.")
  2. Splitting the Document
    Divide the document into smaller chunks for processing.

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    # Split the document into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(documents)
    print(f"Split document into {len(chunks)} chunks.")
  3. Creating Embeddings
    Convert the chunks into vector embeddings.

    from langchain.embeddings import OpenAIEmbeddings
    
    # Generate embeddings
    embeddings = OpenAIEmbeddings()
    chunk_embeddings = [embeddings.embed_document(chunk) for chunk in chunks]
    print("Embeddings created for all chunks.")
  4. Storing in a Vector Store
    Store these embeddings in a vector database for retrieval.

    from langchain.vectorstores import FAISS
    
    # Save embeddings in FAISS
    vector_store = FAISS.from_documents(chunks, embeddings)
    print("Embeddings stored in FAISS vector database.")

Generation: Answering the Question

  1. Accepts a User's Question
    Take a question from the user.

    user_query = "What are the main findings of the research?"
  2. Finds Relevant Content
    Use the vector database to retrieve the most relevant chunks for the query.

    from langchain.chains import RetrievalQAChain
    from langchain.llms import OpenAI
    
    # Set up retriever
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":3})
    
    # Use a QA chain to retrieve relevant content
    llm = OpenAI(model="text-davinci-003")
    qa_chain = RetrievalQAChain.from_chain_type(llm=llm, retriever=retriever)
  3. Generates an Answer
    Generate the answer based on the retrieved content.

    answer = qa_chain.run(user_query)
    print(f"Answer: {answer}")

Output Walkthrough

  • Loading: The PDF is loaded into the system as documents.
  • Splitting: The PDF is split into manageable chunks with overlapping sections for context.
  • Embedding: The chunks are embedded into high-dimensional vectors.
  • Storage: The embeddings are stored in FAISS, making them searchable.
  • Retrieval: When a user asks a question, the system retrieves the most relevant chunks.
  • Answer Generation: An LLM generates an answer based on the retrieved content.

Illustrative Result

User Question: "What are the main findings of the research?"
System Answer: "The research highlights that X intervention improves Y outcomes by Z% according to the data in Table 2."

This demonstrates how LangChain simplifies ingestion and generation phases for creating a scalable, efficient Q&A system.

About

Gen AI using LangChain is designed to simplify and enhance the development of applications that use Large Language Models (LLMs) such as OpenAI, Hugging Face, Groq, or DeepSeek.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published