Math_RLHF

Repository Setup Guide

This guide will walk you through the process of setting up this repository, including cloning it, copying model checkpoints to specific folders, navigating to the correct location, and running a Bash command.

0. Important Prerequisites

a. Please ensure that transformers version is 4.30.2 before proceeding. b. In the file paths used, please include the keyword "llama" in the path name if the model is of llama class.

1. Clone the Repository

To get started, clone this repository to your local machine using the following command:

git clone https://github.com/sarahpannn/Math_RLHF.git

2. Copy Over Model Checkpoints

Generator Model

Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/ONLY_MATH_SFT/four_epochs

New file path: /ONLY_MATH_SFT/four_epochs

Outcome-supervised Reward Model (ORM)

Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/model/llama-3b-ORM/hf_directory

New file path: /llama-3b-ORM/hf_directory

Process-supervised Reward Model (PRM)

Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/model/deberta-v3-large-800k-3

3. CD into the right location

cd DeepSpeed-Chat/training/step3_rlhf_finetuning

4. Run bash command

If RLHF using ORM, run:

bash training_scripts/single_node/ORM/ORM_dump.sh

If RLHF using PRM delivery method avg, run:

bash training_scripts/single_node/PRM/PRM_avg_dump.sh

If RLHF using PRM delivery method product, run:

bash training_scripts/single_node/PRM/PRM_prod_dump.sh

If RLHF using PRM delivery method fine-grained, run:

bash training_scripts/single_node/PRM/real_prm.sh

5. Adjust batch size

I'm not sure how much RAM the A100's will have, but if it's more than 24 gb like Inanna, I assume increasing the per_device_batch_size will maximize usage.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
DeepSpeed-Chat		DeepSpeed-Chat
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Math_RLHF

Repository Setup Guide

0. Important Prerequisites

1. Clone the Repository

2. Copy Over Model Checkpoints

Generator Model

Outcome-supervised Reward Model (ORM)

Process-supervised Reward Model (PRM)

3. CD into the right location

4. Run bash command

5. Adjust batch size

About

Releases

Packages

Languages

sarahpannn/Math_RLHF

Folders and files

Latest commit

History

Repository files navigation

Math_RLHF

Repository Setup Guide

0. Important Prerequisites

1. Clone the Repository

2. Copy Over Model Checkpoints

Generator Model

Outcome-supervised Reward Model (ORM)

Process-supervised Reward Model (PRM)

3. CD into the right location

4. Run bash command

5. Adjust batch size

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages