Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/meta_evaluation		data/meta_evaluation
examples		examples
imgs		imgs
rag_baselines		rag_baselines
ragchecker		ragchecker
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt
result.txt		result.txt

Repository files navigation

RAGChecker: A Fine-grained Framework For Diagnosing RAG

RAGChecker is an advanced automatic evaluation framework designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It provides a comprehensive suite of metrics and tools for in-depth analysis of RAG performance.

Figure: RefChecker Metrics

🌟 Highlighted Features

Holistic Evaluation: RAGChecker offers Overall Metrics for an assessment of the entire RAG pipeline.
Diagnostic Metrics: Diagnostic Retriever Metrics for analyzing the retrieval component. Diagnostic Generator Metrics for evaluating the generation component. These metrics provide valuable insights for targeted improvements.
Fine-grained Evaluation: Utilizes claim-level entailment operations for fine-grained evaluation.
Benchmark Dataset: A comprehensive RAG benchmark dataset with 4k questions covering 10 domains (upcoming).
Meta-Evaluation: A human-annotated preference dataset for evaluating the correlations of RAGChecker's results with human judgments.

RAGChecker empowers developers and researchers to thoroughly evaluate, diagnose, and enhance their RAG systems with precision and depth.

🚀 Quick Start

Setup Environment

pip install refchecker
python -m spacy download en_core_web_sm
pip install -r requirements

Run the Checking Pipeline

We have put an example file in the examples folder. Please process your own data with the same format in examples/checking_inputs.json.

If you are using AWS Bedrock version of Llama3 70B for the claim extractor and checker, use the following command to run the checking pipeline, the checking results will be saved to --output_path:

python ragchecker/checking.py \
    --input_path=examples/checking_inputs.json \
    --output_path=examples/checking_outputs.json \
    --extractor_name=bedrock/meta.llama3-70b-instruct-v1:0 \
    --checker_name=bedrock/meta.llama3-70b-instruct-v1:0 \
    --batch_size_extractor=64 \
    --batch_size_checker=64 \
    --answer2response \
    --response2answer \
    --retrieved2response \
    --retrieved2answer

Computing Metrics

Use the following command for computing the metrics:

python ragchecker/rag_eval.py --file=examples/checking_outputs.json

It will output the values for the metrics like follows:

Results for examples/checking_outputs.json:
{
  "overall_metrics": {
    "precision": 73.3,
    "recall": 62.5,
    "f1": 67.5
  },
  "retriever_metrics": {
    "claim_recall": 61.4,
    "context_precision": 87.5
  },
  "generator_metrics": {
    "context_utilization": 87.5,
    "noise_sensitivity_in_relevant": 22.5,
    "noise_sensitivity_in_irrelevant": 0.0,
    "hallucination": 4.2,
    "self_knowledge": 25.0,
    "faithfulness": 70.8,
    "claim_count": 8,
    "response_acc": 0.0
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGChecker: A Fine-grained Framework For Diagnosing RAG

🌟 Highlighted Features

🚀 Quick Start

Setup Environment

Run the Checking Pipeline

Computing Metrics

Meta Evaluation

Security

License

About

Releases

Packages

Languages

License

JaeDukSeo/RAGChecker

Folders and files

Latest commit

History

Repository files navigation

RAGChecker: A Fine-grained Framework For Diagnosing RAG

🌟 Highlighted Features

🚀 Quick Start

Setup Environment

Run the Checking Pipeline

Computing Metrics

Meta Evaluation

Security

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages