Game-theoretical Preference Optimization (GPO)

The Code of Paper "Toward Optimal LLM Alignments Using Two-Player Games". 👉 [Arvix Link]

🔩 Requirements & Setup

This reponsitory based on MOSS-RLHF.

This repository works on Python 3.8 and PyTorch 1.13.1.

We recommend using the conda virtual environment to run the code.

Step 1: Create a new Python virtual environment

conda update conda -n base -c defaults
conda create -n rlhf python=3.8
conda activate rlhf

Step 2: Install PyTorch and TensorBoard

conda install pytorch==1.13.1 pytorch-cuda=11.7 tensorboard -c pytorch -c nvidia

Step 3: Install the remaining dependencies

conda install datasets accelerate safetensors chardet cchardet -c huggingface -c conda-forge
pip3 install transformers sentencepiece einops triton==1.0.0 rouge jionlp==1.4.14 nltk sacrebleu cpm_kernels

apt install libaio-dev
DS_BUILD_OPS=1 pip install deepspeed

pip3 install -r requirements.txt

✨ Start training your own model!

Training GPO model

Run the command below.

# You need to use your own sft model currently.
bash train_gpo.sh

Citation

@article{zheng2024toward,
  title={Toward Optimal LLM Alignments Using Two-Player Games},
  author={Zheng, Rui and Guo, Hongyi and Liu, Zhihan and Zhang, Xiaoying and Yao, Yuanshun and Xu, Xiaojun and Wang, Zhaoran and Xi, Zhiheng and Gui, Tao and Zhang, Qi and others},
  journal={arXiv preprint arXiv:2406.10977},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gpo		gpo
ppo_data		ppo_data
rm		rm
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
__init__.py		__init__.py
accelerate_config.yaml		accelerate_config.yaml
config_ppo.py		config_ppo.py
config_rm.py		config_rm.py
metric.py		metric.py
requirements.txt		requirements.txt
train_gpo.py		train_gpo.py
train_gpo.sh		train_gpo.sh
train_rm.py		train_rm.py
train_rm.sh		train_rm.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Game-theoretical Preference Optimization (GPO)

The Code of Paper "Toward Optimal LLM Alignments Using Two-Player Games". 👉 [Arvix Link]

🔩 Requirements & Setup

Step 1: Create a new Python virtual environment

Step 2: Install PyTorch and TensorBoard

Step 3: Install the remaining dependencies

✨ Start training your own model!

Training GPO model

Citation

About

Licenses found

Releases

Packages

Languages

License

Licenses found

ruizheng20/gpo

Folders and files

Latest commit

History

Repository files navigation

Game-theoretical Preference Optimization (GPO)

The Code of Paper "Toward Optimal LLM Alignments Using Two-Player Games". 👉 [Arvix Link]

🔩 Requirements & Setup

Step 1: Create a new Python virtual environment

Step 2: Install PyTorch and TensorBoard

Step 3: Install the remaining dependencies

✨ Start training your own model!

Training GPO model

Citation

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages