Skip to content

thu-coai/SPaR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SPaR

Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

🤗 Data • 📃 Paper

SPaR focuses on creating interference-free preference pairs for effective self- improvement. An example of the interfering factors (story content) in independently sampled multiple responses (Left). Refined response pairs exclude these factors, highlight the key difference (ending sentence), and lead to improved performance on iteratively trained LLaMA3-8B-Instruct (Right).

BPO

Table of Contents

Data

SPaR dataset

SPaR Dataset can be found on Hugging Face.

We provide a high-quality SFT dataset for instruction-following tasks and the data for iterative self-training.

Quick Start

For all codes, we have added #TODO comments to indicate places in the code that need modification before running. Please update the relevant parts as noted before executing each file.

Data Construction

To construct the iterative training data yourself, run the following command

cd src

bash infer.sh

python process_data.py

bash judge.py

python process_data.py

vllm serve <your-model-path>

python tree_search.py

python process_data.py

Model Training

If you want to train your own model, please run the following command:

cd src

# dpo
llamafactory-cli train configs/dpo.yaml

# sft
llamafactory-cli train configs/sft.yaml

Acknowledgement

Citation

@misc{cheng2024sparselfplaytreesearchrefinement,
      title={SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models}, 
      author={Jiale Cheng and Xiao Liu and Cunxiang Wang and Xiaotao Gu and Yida Lu and Dan Zhang and Yuxiao Dong and Jie Tang and Hongning Wang and Minlie Huang},
      year={2024},
      eprint={2412.11605},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.11605}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published