Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendations for multi-node training of a 7B model with RL #487

Open
zhudefa opened this issue Dec 17, 2024 · 1 comment
Open

Recommendations for multi-node training of a 7B model with RL #487

zhudefa opened this issue Dec 17, 2024 · 1 comment

Comments

@zhudefa
Copy link

zhudefa commented Dec 17, 2024

If I want to run multi-node 7BRL training experiments, what is the recommended configuration? Should actor_num_gpus_per_node be set to multiple 7s?

Is it also necessary to launch in the same way as the 70B model, using the following command:
source configs/beaker_configs/ray_node_setup.sh && python open_instruct/ppo_vllm_thread_ray_gtrl.py?

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Dec 18, 2024

Hi @zhudefa,

It doesn't have to be a multiple of 7s.

7 is fine for a single node setting: we use 7 gpus for training and 1 gpu for inference. Its usage is like this:

--actor_num_gpus_per_node 7 8 8 8 meaning using 7 gpus in the first node to do training and 8 gpus in the next 3 nodes to do training.

Is it also necessary to launch in the same way as the 70B model, using the following command:
source configs/beaker_configs/ray_node_setup.sh && python open_instruct/ppo_vllm_thread_ray_gtrl.py?

yes. the ray_node_setup.sh setups the multi node ray stuff to connect to the main ray head node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants