Skip to content

Build your own Multimodal RAG Application using less than 300 lines of code.

Notifications You must be signed in to change notification settings

13331112522/m-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal RAG

Build your own Multimodal RAG Application using less than 300 lines of code.

You can talk to any documents with LLM including Word, PPT, CSV, PDF, Email, HTML, Evernote, Video and image.

  • Features
    • Ingest your videos and pictures with Multimodal LLM
    • Q&A with LLM about any files
    • Run locally without compromising your privacy
    • Locating the relevant resource with citation
    • Extremely simple with only one python file with no more than 300 lines of code
  • Process
    • Parse videos or pictures in the folder into text with LLava, which run locally with ollama, and ingest other types of files with LangChain.
    • Ingest the text into vectorDB
    • Query it with local LLM.
  • Setup
    • Create and activate virtual environment

      python -m venv m-rag
      source m-rag/bin/activate
    • Clone repo and install dependencies

      git clone https://github.com/13331112522/m-rag.git
      cd m-rag
      python -m pip install -r requirements.txt
      cp example.env .env
    • Get ready for models

      • Put local LLM weights into folder models, supporting any GGUF format, and change the MODEL_PATH in .env for your model path. You can download the weights by visiting Huggingface/theBloke. We use mistral-7b-instruct-v0.1.Q4_K_S.gguf as our LLM for query.
      • We currently employed the HuggingfaceEmbedding, but you can change it to local embedding like GPT4ALLEmbedding by changing the EMBEDDINGS_MODEL_NAME in .env.
      • Run MLLM. We employ the latest llava 1.6 for image and video parsing.
      ollama run llava
    • Environment variables setting

      • Change the environment variables according to your needs in .env. SOURCE_DIRECTORY refers to the folder which contains all the images and videos you want to retrieve, and STRIDE refers to the frame interval for video parse. For long video parse, you can change stride to big number for higher process speed but less details.
      • Replace with the actual path to your FFmpeg executable in os.environ["IMAGEIO_FFMPEG_EXE"] = "/path/to/ffmpeg" to leverage the FFmpeg backend.
    • Run

      Put all the files you want to talk with into the folder source. Run following command:

      python m-rag.py

      It will generate the folder source_documents as the storage of parsed text and faiss_index as the vectorDB. If the two folders already exist, it will start query directly.

  • Acknowledgement
    • llava 1.6
    • PrivateGPT
    • ollama
    • langchain
    • Llama.cpp

About

Build your own Multimodal RAG Application using less than 300 lines of code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages