This guide will walk you through the process of setting up this repository, including cloning it, copying model checkpoints to specific folders, navigating to the correct location, and running a Bash command.
a. Please ensure that transformers version is 4.30.2 before proceeding. b. In the file paths used, please include the keyword "llama" in the path name if the model is of llama class.
To get started, clone this repository to your local machine using the following command:
git clone https://github.com/sarahpannn/Math_RLHF.git
Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/ONLY_MATH_SFT/four_epochs
New file path: /ONLY_MATH_SFT/four_epochs
Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/model/llama-3b-ORM/hf_directory
New file path: /llama-3b-ORM/hf_directory
Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/model/deberta-v3-large-800k-3
cd DeepSpeed-Chat/training/step3_rlhf_finetuning
If RLHF using ORM, run:
bash training_scripts/single_node/ORM/ORM_dump.sh
If RLHF using PRM delivery method avg, run:
bash training_scripts/single_node/PRM/PRM_avg_dump.sh
If RLHF using PRM delivery method product, run:
bash training_scripts/single_node/PRM/PRM_prod_dump.sh
If RLHF using PRM delivery method fine-grained, run:
bash training_scripts/single_node/PRM/real_prm.sh
I'm not sure how much RAM the A100's will have, but if it's more than 24 gb like Inanna, I assume increasing the per_device_batch_size will maximize usage.