Name		Name	Last commit message	Last commit date
parent directory ..
data		data
eval		eval
weights		weights
README.md		README.md
collect_eval_results.py		collect_eval_results.py
dpo_train_with_accelerate.sh		dpo_train_with_accelerate.sh
dpo_train_with_accelerate_config.sh		dpo_train_with_accelerate_config.sh
dpo_train_with_qlora.sh		dpo_train_with_qlora.sh
finetune_lora_with_accelerate.sh		finetune_lora_with_accelerate.sh
finetune_qlora_with_accelerate.sh		finetune_qlora_with_accelerate.sh
finetune_with_accelerate.sh		finetune_with_accelerate.sh
finetune_with_accelerate_config.sh		finetune_with_accelerate_config.sh
rejection_sampling_hello_world.bash		rejection_sampling_hello_world.bash
rejection_sampling_tulu.bash		rejection_sampling_tulu.bash
submit_dpo_job.py		submit_dpo_job.py
submit_eval_jobs.py		submit_eval_jobs.py
submit_finetune_job.py		submit_finetune_job.py
submit_finetune_jobs.sh		submit_finetune_jobs.sh

README.md

Scripts Docs

There are many scripts in this repo, serving many different purposes. Here's a breakdown of the most important training scripts and how to use them. Generally, they are split into the following categories:

Instruction training.
Direct Preference Optimization (DPO) training.
Submitting jobs on Ai2 infrastructure (Beaker). **Use this type of script for launching multiple jobs easily)
Data and results management.

This readme covers each category and normal use-cases.

Instruct training scripts

The following scripts are used for fine-tuning. For Ai2 users, these scripts all work best in interactive sessions (not in batch jobs).

finetune_lora_with_acceralate.sh: Script for running open_instruct/finetune.py with LoRA.
finetune_qlora_with_acceralate.sh: Script for running open_instruct/finetune.py with QLoRA.
finetune_with_acceralate_config.sh: Script for running open_instruct/finetune.py with configs found in configs/train_configs/sft/. Good for reproducing results. Example usages:

sh scripts/finetune_with_accelerate_config.sh 1 configs/train_configs/sft/default.yaml
sh scripts/finetune_with_accelerate_config.sh 8 configs/train_configs/sft/olmo_17_sft.yaml

finetune_with_acceralate.sh: Script that the _config option above is based on. Uses options provided at CLI. Change hyperparameters by manually editing or copying the script.

Direct Preference Optimization (DPO) scripts

dpo_train_with_accelerate_config.sh: Script for running open_instruct/dpo_tune.py with configs found in configs/train_configs/dpo/. Good for reproducing results. E.g.

sh scripts/dpo_train_with_accelerate_config.sh 8 configs/train_configs/dpo/default.yaml

dpo_train_with_accelerate.sh: Script for running open_instruct/dpo_tune.py directly. Change hyperparameters by manually editing or copying the script. E.g.

sh scripts/dpo_train_with_accelerate.sh

dpo_train_with_qlora.sh: Same as (2) with QLoRA quantization.

Beaker / job submission scripts

submit_eval_jobs.py: Submit eval jobs for tasks in scripts/evals/. For example, llama 3 tulu 2 and upload to the tulu-3 eval database.

python scripts/submit_eval_jobs.py --model_name llama_31_tulu_2_8b --location 01J4MGRSS3FM1J4E6XSH3459DK --is_tuned --workspace tulu-3-results --preemptible --use_hf_tokenizer_template --beaker_image hamishivi/open-instruct-hf-upload-testing --upload_to_hf allenai/tulu-3-evals//results/testing_oi_hf

submit_finetune_jobs.py: Core script for submitting multiple and configurable instruction tuning jobs. This script works for both single- and multi-node configurations. It by default reads configs in configs/train_configs, but also can take in CLI arguments matching those in open_instruct/utils.py FlatArguments class. Example of running this is in scripts/submit_finetune_jobs.sh.

python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 1e-6
python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 4e-6
python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 1e-5
python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 4e-5

To use this for multi-node jobs, here is an example that runs IFT on 4 nodes:

python scripts/submit_finetune_job.py --default_beaker_config configs/beaker_configs/default_finetune_multinode.yaml --config configs/train_configs/sft/tulu3_8b_preview_mix_v3.1.yaml --cluster ai2/jupiter-cirrascale-2 --workspace ai2/tulu-3-dev --num_nodes 4

submit_dpo_job.py: Core script for submitting DPO tuning jobs. It should behave like the finetune script, but additionally can take in beaker datasets to mount via --datasets, e.g.:

python scripts/submit_dpo_job.py --config configs/train_configs/dpo/my_dpo_config.yaml --datasets my_beaker_id:/model --experiment_name my_experiment_name

In this case, we also ask you provide an experiment name, as we don't know the name of the model being finetuned if it is mounted to /model.

Docker-less job submssions

It is possible to re-use the existing environment you have and run things without having to build a docker container. The idea is to install python on NFS. You can refer to https://gist.github.com/vwxyzjn/58a2714cf3fbab5bf672ff750e86a537 for more detail.

Then you can submit jobs via mason.py, which we modified from https://github.com/allenai/mason. You can run the following to do a quick check

python mason.py \
    --cluster ai2/allennlp-cirrascale ai2/general-cirrascale-a5000 ai2/general-cirrascale-a5000 ai2/general-cirrascale-a100-80g-ib \
    --priority low \
    --budget ai2/allennlp \
    --gpus 1 -- which python

If you are successful in setting up python on NFS, your which python should match the which python output in the beaker job.

After setting it up successfully, say you are running sh scripts/dpo_train_with_accelerate_config.sh 8 configs/train_configs/dpo/default.yaml locally, now you can submit batch jobs via

python mason.py \
    --cluster ai2/allennlp-cirrascale ai2/general-cirrascale-a5000 ai2/general-cirrascale-a5000 ai2/general-cirrascale-a100-80g-ib \
    --priority low \
    --budget ai2/allennlp \
    --gpus 1 -- sh scripts/dpo_train_with_accelerate_config.sh 8 configs/train_configs/dpo/default.yaml

Other

collect_eval_results.py: For collating metrics from open-instruct evaluation job. E.g.

python scripts/collect_eval_results.py \
    --experiment_id 01HV0P4E3MW9211HX0JEKM0PXM \
    --job_suffix _tulu2_13b_dpo_ultrainteract_04082024 \
    --output_file metrics.json \
    --task_order gsm_cot gsm_direct toxigen alpaca_eval \
    --print_table \
    --table_file metrics.tsv

weights/weight_diff.py: For converting weight diffs (as used with LLaMA 1) to full weights for eval/use. E.g.

python scripts/weights/weight_diff.py recover --path_raw ${hf_llama_path} --path_tuned ${output_path} --path_diff ${diff_location}

weights/convert_llama_weights_to_hf.sh: Use transformers to convert weights.
data/*: scripts for inpecting statistics of and rebuilding Tulu 1/2/N datasets from scratch (where possible).

Notes on data mixing

Most of the scripts with _config take in configs that look like the following (just the data part):

dataset_mixer:
 allenai/tulu-v2-sft-mixture: 0.5
 HuggingFaceH4/no_robots: 0.8

There are many ways to configure data mixing. This is done with fractions, but also they can be done with number of samples directly. E.g.

dataset_mixer:
 allenai/tulu-v2-sft-mixture: 50000
 HuggingFaceH4/no_robots: 2500

The mixer is the advanced alternate to existing data arguments (which are still compatible, for reproducibility), such as local files:

train_file: data/processed/tulu_v2/tulu_v2_data.jsonl

or single HuggingFace datasets,

dataset_name: allenai/tulu-v2-sft-mixture

Currently the dataset mixer is only supported for SFT models, but this will be expanded. With these options, the script will fail if multiple data args are passed, in the list of dataset_mixer, train_file, or dataset_name. An internal arg, dataset_mixer_list was created to handle conversion from dict to string for Beaker jobs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

README.md

Scripts Docs

Instruct training scripts

Direct Preference Optimization (DPO) scripts

Beaker / job submission scripts

Docker-less job submssions

Other

Notes on data mixing

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

Scripts Docs

Instruct training scripts

Direct Preference Optimization (DPO) scripts

Beaker / job submission scripts

Docker-less job submssions

Other

Notes on data mixing