SimGRAG

The is the repository for the paper "SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation". SimGRAG is a KG-driven RAG approach that can support various KG based tasks, such as question answering and fact verification.

Prerequisites

It supports plug-and-play usability with the following three components:

Large language model: For generation.
Embedding model: For node and relation embedding.
Vector database: store the embedding of the nodes and relations in the knowledge graph, supporting efficient similarity search.

This repository is built on open-source solutions of these components:

Ollama for runing the large language model of Llama 3 70B
Nomic embedding model for node and relation embedding
Milvus for vector database

You can replace the components with your own preference, all you need is to prepare the APIs. Next, we provide the preparation steps for the components we used.

Ollama

Please visit the Ollama website to install Ollama on your local environment. After installation, you can use the following command to run the Llama 3 70B model:

ollama run llama3:70b

Then, you can use the following command to start the service needed by SimGRAG:

bash ollama_server.sh

Nomic Embedding Model

You can clone the model from here with the following command:

mkdir -p data/raw
cd data/raw
git clone https://huggingface.co/nomic-ai/nomic-embed-text-v1

Milvus

Please visit the Milvus website to install Milvus on your local environment. After installation, you can follow its documentation to start the service needed by SimGRAG.

Data preparation

MetaQA

Please download the MetaQA dataset following the url in the repository and put it in the data/raw folder.

FactKG

Please download the FactKG dataset following the url in the repository and put it in the data/raw folder.

Directonary structure

After preparation, the directories should be organized as follows:

SimGraphRAG
├── data
│   └── raw
│       ├── nomic-embed-text-v1
│       ├── MetaQA
│       └── FactKG
├── configs
├── pipeline
├── prompts
└── src

Configuration

You can find the configuration files in the configs folder. You can modify the configuration files to fit your needs.

Runing the pipeline

For MetaQA, you can run the following command:

cd pipeline
python metaQA_index.py
python metaQA_query1hop.py
python metaQA_query2hop.py
python metaQA_query3hop.py

For FactKG, you can run the following command:

cd pipeline
python factKG_index.py
python factKG_query.py

The results can be found in the file that assigned to the "output_filename" in the configuration file. For example, "results/FactKG_query.txt". Each line of the result file is a dictionary, in which the key "correct" presents the correctness of the final answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimGRAG

Prerequisites

Ollama

Nomic Embedding Model

Milvus

Data preparation

MetaQA

FactKG

Directonary structure

Configuration

Runing the pipeline

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
pipeline		pipeline
prompts		prompts
src		src
.gitignore		.gitignore
README.md		README.md
ollama_server.sh		ollama_server.sh

YZ-Cai/SimGRAG

Folders and files

Latest commit

History

Repository files navigation

SimGRAG

Prerequisites

Ollama

Nomic Embedding Model

Milvus

Data preparation

MetaQA

FactKG

Directonary structure

Configuration

Runing the pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages