SPaR

Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

🤗 Data • 📃 Paper

SPaR focuses on creating interference-free preference pairs for effective self- improvement. An example of the interfering factors (story content) in independently sampled multiple responses (Left). Refined response pairs exclude these factors, highlight the key difference (ending sentence), and lead to improved performance on iteratively trained LLaMA3-8B-Instruct (Right).

Data

SPaR dataset

SPaR Dataset can be found on Hugging Face.

We provide a high-quality SFT dataset for instruction-following tasks and the data for iterative self-training.

Quick Start

For all codes, we have added #TODO comments to indicate places in the code that need modification before running. Please update the relevant parts as noted before executing each file.

Data Construction

To construct the iterative training data yourself, run the following command

cd src

bash infer.sh

python process_data.py

bash judge.py

python process_data.py

vllm serve <your-model-path>

python tree_search.py

python process_data.py

Model Training

If you want to train your own model, please run the following command:

cd src

# dpo
llamafactory-cli train configs/dpo.yaml

# sft
llamafactory-cli train configs/sft.yaml

Acknowledgement

Training code: LLaMA-Factory
Tree-search implementation: Rest-MCTS*

Citation

@misc{cheng2024sparselfplaytreesearchrefinement,
      title={SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models}, 
      author={Jiale Cheng and Xiao Liu and Cunxiang Wang and Xiaotao Gu and Yida Lu and Dan Zhang and Yuxiao Dong and Jie Tang and Hongning Wang and Minlie Huang},
      year={2024},
      eprint={2412.11605},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.11605}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPaR

Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Table of Contents

Data

SPaR dataset

Quick Start

Data Construction

Model Training

Acknowledgement

Citation

About

Releases

Packages

Languages

License

thu-coai/SPaR

Folders and files

Latest commit

History

Repository files navigation

SPaR

Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Table of Contents

Data

SPaR dataset

Quick Start

Data Construction

Model Training

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages