You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I want to run multi-node 7BRL training experiments, what is the recommended configuration? Should actor_num_gpus_per_node be set to multiple 7s?
Is it also necessary to launch in the same way as the 70B model, using the following command: source configs/beaker_configs/ray_node_setup.sh && python open_instruct/ppo_vllm_thread_ray_gtrl.py?
The text was updated successfully, but these errors were encountered:
7 is fine for a single node setting: we use 7 gpus for training and 1 gpu for inference. Its usage is like this:
--actor_num_gpus_per_node 7 8 8 8 meaning using 7 gpus in the first node to do training and 8 gpus in the next 3 nodes to do training.
Is it also necessary to launch in the same way as the 70B model, using the following command:
source configs/beaker_configs/ray_node_setup.sh && python open_instruct/ppo_vllm_thread_ray_gtrl.py?
yes. the ray_node_setup.sh setups the multi node ray stuff to connect to the main ray head node.
If I want to run multi-node 7BRL training experiments, what is the recommended configuration? Should
actor_num_gpus_per_node
be set to multiple 7s?Is it also necessary to launch in the same way as the 70B model, using the following command:
source configs/beaker_configs/ray_node_setup.sh && python open_instruct/ppo_vllm_thread_ray_gtrl.py?
The text was updated successfully, but these errors were encountered: