ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation [NeurIPS 2024]

ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth, adaptive and comprehensive alignment assessments on LLMs. ALI-Agent operates through two principal stages: Emulation and Refinement. During the Emulation stage, ALI-Agent automates the generation of realistic test scenarios. In the Refinement stage, it iteratively refines the scenarios to probe long-tail risks. Specifically, ALI-Agent incorporates a memory module to guide test scenario generation, a tool-using module to reduce human labor in tasks such as evaluating feedback from target LLMs, and an action module to refine tests.

📋 Catalogue

Catalogue
Preparations
Evaluation
- Quick Start
- See the result

⚙️ Preparations

Step 1. Install requirements.txt

Set up a virtualenv and install the pytorch manually.

Our experiments have been tested on Python 3.9.17 with PyTorch 2.0.1+cu117.

conda create --name myenv python=3.9.17
conda activate myenv

After that, install all the dependencies listed in the requirements.txt file by running the following command:

pip install -r requirements.txt

Step 2. Download checkpoints of evaluator

You can find checkpoints of evaluators in the link : (checkpoints)

Directly download the three folders and put them in the main directory (where main.py can be found).

⌛️ Evaluation

Make sure you are in the main directory (where main.py can be found).

Replace "OPENAI_API_KEY" in parse.py with your own OpenAI API key.

Quick Start

To run the agent on a specified dataset, run code as

python main.py --llm_name llama2-13b --dataset ethic_ETHICS  --type ethic --start_from 0 --seed 0

Supported names for llm_name, data_set, type can be found in parse.py

To run the agent with web browsing, replace "BING_API_KEY" and "OPENAI_API_KEY" in parse.py with your own key, and "customer_config_id" with your own

python main.py --llm_name llama2-13b --web_browsing

Test local model

To test a locally deployed model, modify the local_model_path parameter in the parse.py file to specify the local model file path and set the model_type parameter to "local".

See the Results

The results of the simulation will be saved to database/<dataset>/<llm_name> directory.

🚀 Our DEMO For Demonstration

Alignment-Agent Demo

The Alignment-Agent demo evaluates LLM alignment using two key methods:

Specific Dataset Grading – Users can test LLMs against predefined datasets (e.g., ethic_ETHICS) to assess ethical alignment and compliance.
- In an ethics evaluation, the system provides test cases and automatically scores the responses.
Web-Browsing Grading – Users can input queries like "China copyright", and the system will search the web, summarize key information, and generate test cases to evaluate the LLM's accuracy.
- This is useful for legal checks, news analysis, and policy interpretation.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
database		database
dataset		dataset
simulation		simulation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
flow.png		flow.png
main.py		main.py
parse.py		parse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation [NeurIPS 2024]

📋 Catalogue

⚙️ Preparations

Step 1. Install requirements.txt

Step 2. Download checkpoints of evaluator

⌛️ Evaluation

Quick Start

Test local model

See the Results

🚀 Our DEMO For Demonstration

About

Releases

Packages

Contributors 2

Languages

License

SophieZheng998/ALI-Agent

Folders and files

Latest commit

History

Repository files navigation

ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation [NeurIPS 2024]

📋 Catalogue

⚙️ Preparations

Step 1. Install requirements.txt

Step 2. Download checkpoints of evaluator

⌛️ Evaluation

Quick Start

Test local model

See the Results

🚀 Our DEMO For Demonstration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages