Emergence of Thinking

This repository contains the code for the paper "On the Emergence of Thinking in LLMs I: Searching for the Right Intuition"

arXiv link

Environment

bash create_env.sh
pip install -e .

Figure 5(a) of the paper

python -m openrlhf.cli.orm_server_efficient --dataset evaluation/data/math --model_name meta-llama/Llama-3.1-8B-Instruct --log_dir ./logs/openrlhf_train_ppo --length_penalty 0.0 --use_gpt 0 &
mkdir -p /tmp/code &
bash train_ppo_llama_ray_8B_rm_multi.sh 3

Figure 5(b) of the paper

python -m openrlhf.cli.orm_server_efficient --dataset evaluation/data/math --model_name meta-llama/Llama-3.1-8B-Instruct --log_dir ./logs/openrlhf_train_ppo --length_penalty 1000 --use_gpt 0 &
mkdir -p /tmp/code &
bash train_ppo_llama_ray_8B_rm_multi.sh 3

Evaluation after the PPO training

python -m evaluation.eval_math_data_parallel --config ./eval_config.yaml

Qwen2.5-32B-Instruct experiment

HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Qwen/Qwen2.5-32B-Instruct
python -m openrlhf.cli.orm_server_efficient --dataset evaluation/data/aime_full_except_24 --model_name Qwen/Qwen2.5-32B-Instruct --log_dir ./logs/openrlhf_train_ppo --length_penalty 1000 --use_gpt 1 &
mkdir -p /tmp/code &
bash train_ppo_qwen_ray_32B_rm_multi.sh 3

Acknowledgement

The repo is based on the code from OpenRLHF and the evaluation code is taken from Qwen

Name		Name	Last commit message	Last commit date
Latest commit History 1,063 Commits
.github/workflows		.github/workflows
backup		backup
data		data
docs		docs
evaluation		evaluation
examples/scripts		examples/scripts
openrlhf		openrlhf
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README_old.md		README_old.md
README_old_ja.md		README_old_ja.md
README_old_zh.md		README_old_zh.md
create_env.sh		create_env.sh
eval_config.yaml		eval_config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train_ppo_llama_ray_8B_rm_multi.sh		train_ppo_llama_ray_8B_rm_multi.sh
train_ppo_qwen_ray_32B_rm_multi.sh		train_ppo_qwen_ray_32B_rm_multi.sh
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emergence of Thinking

Environment

Figure 5(a) of the paper

Figure 5(b) of the paper

Evaluation after the PPO training

Qwen2.5-32B-Instruct experiment

Acknowledgement

About

Releases

Packages

Languages

License

GuanghaoYe/Emergence-of-Thinking

Folders and files

Latest commit

History

Repository files navigation

Emergence of Thinking

Environment

Figure 5(a) of the paper

Figure 5(b) of the paper

Evaluation after the PPO training

Qwen2.5-32B-Instruct experiment

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages