GitHub - princeton-pli/Instruct-SkillMix

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

This repository contains the code for our paper Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

**************************** Updates ****************************

01/22/2025: Our paper is accepted as a poster in ICLR 2025. See you in Singapore!

Quick Links

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Quick Links
Overview
Main Results
Experiments
Model links
Dataset links
Bugs or Questions?
Citation

Overview

Main Results

Experiments

In the following section, we provide instructions on reproducing the experiments in our paper.

Prepare Conda Environment

First, set the following bash variable based on your machine and update the following files.

PROJECT_DIR="/absolute path to the project folder/Instruct-SkillMix"
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/1_finetune.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/2_validate.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/3_validate_annotate.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/4_evaluate_AlpacaEval.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/4_evaluate_lm-evaluation-harness.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/4_evaluate_MTBench.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/4_evaluate_WildBench.sh
sed -i "s#export PROJECT_DIR=#export PROJECT_DIR=${PROJECT_DIR}#" $PROJECT_DIR/src/slurm/5_annotate.sh

Then prepare a conda environment using the following commands

cd $PROJECT_DIR
conda create -n Instruct-SkillMix python=3.11 -y
conda activate Instruct-SkillMix
pip install pip==24.3.1  # enable PEP 660 support 
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
pip install flashinfer==0.1.3 -i https://flashinfer.ai/whl/cu121/torch2.3/
pip install -r requirements.txt
pip install -e lm-evaluation-harness # requirement: pip < 25.0
cd FastChat
pip install -e ".[model_worker,llm_judge]" # requirement: pip < 25.0
CONDA_ENV="/absolute path to the conda environment/Instruct-SkillMix"
cp alpaca_eval/main.py ${CONDA_ENV}/lib/python3.11/site-packages/alpaca_eval/main.py

Prepare API Keys

Insert your own API keys to src/openai_configs.yaml (for AlpacaEval annotations) and src/api_keys.sh (for data generation and all other annotations).

Generate Synthetic Data (TBD)

Edit the relevant bash script src/data_generation/1_generate_data.sh and run

cd $PROJECT_DIR
sbatch src/data_generation/1_generate_data.sh

To create held-out data for validation, change the SEED and SAVE_PREFIX variables and rerun the generation with smaller NUM_DATA.

To create low quality data (brevity / junk), edit the relevant bash script src/data_generation/2_generate_junk_data.sh and run

cd $PROJECT_DIR
sbatch src/data_generation/2_generate_junk_data.sh

Train Models

First, download relevant base models

cd $PROJECT_DIR
conda activate Instruct-SkillMix
tune download meta-llama/Llama-2-7b-hf --output-dir ./base_models/Llama-2-7b-hf
tune download meta-llama/Llama-2-13b-hf --output-dir ./base_models/Llama-2-13b-hf
tune download mistral-community/Mistral-7B-v0.2 --output-dir ./base_models/Mistral-7B-v0.2 --ignore-patterns None
tune download meta-llama/Meta-Llama-3-8B --output-dir ./base_models/Meta-Llama-3-8B --ignore-patterns "original/consolidated.00.pth" 
## if you do not have hf_token saved in your bash configuration files, also append --hf-token <HF_TOKEN> above
mv ./base_models/Meta-Llama-3-8B/original/tokenizer.model ./base_models/Meta-Llama-3-8B

tune download google/gemma-2-9b --output-dir ./base_models/gemma-2-9b  --ignore-patterns None
tune download google/gemma-2-2b --output-dir ./base_models/gemma-2-2b  --ignore-patterns None

Then, edit the relevant slurm script src/slurm/1_finetune.sh and run

cd $PROJECT_DIR
sbatch src/slurm/1_finetune.sh

Select Model Checkpoint

We select the best checkpoint based on the performance on heldout data. Edit the relevant slurm script src/slurm/2_validate.sh and run

cd $PROJECT_DIR
sbatch src/slurm/2_validate.sh

Then evaluate each checkpoint by editing and running src/slurm/3_validate_annotate.sh. Select the best model checkpoint from e.g., results/ism_sda_k2_heldout/leaderboards/weighted_alpaca_eval_gpt4_turbo/leaderboard.csv.

Evaluate Models

Edit the relevant slurm scripts src/slurm/4_evaluate_*.sh and generate model outputs by running

cd $PROJECT_DIR
sbatch src/slurm/4_evaluate_AlpacaEval.sh
sbatch src/slurm/4_evaluate_MTBench.sh
sbatch src/slurm/4_evaluate_lm-evaluation-harness.sh
sbatch src/slurm/4_evaluate_WildBench.sh

Then annotate the model outputs by editing and running src/slurm/5_annotate.sh.

You can view the evaluation outputs by checking ./results/alpaca_eval_2/leaderboards/weighted_alpaca_eval_gpt4_turbo/leaderboard.csv for AlpacaEval and each folder in ./results/lm-evaluation-harness/model_outputs for lm-evaluation-harness. For MTBench and WildBench, run

cd $PROJECT_DIR
python FastChat/fastchat/llm_judge/show_result.py --judge-model gpt-4-0613 --input-file ./results/mt_bench/model_judgment/gpt-4-0613_single.jsonl
python WildBench/src/view_wb_eval.py ./results/wild_bench_v2 pairwise-gpt4t -1

Links

We release links to our trained models and generated dataset for easier experimentations in the future.

Name	Link
Meta-Llama-3-8B trained on Instruct-SkillMix-SDA	Link
Instruct-SkillMix-SDD Dataset	Link
Instruct-SkillMix-SDA Dataset	Link

Bugs or Questions?

If you have any questions related to the code or the paper, feel free to email Simran (skaur 'at' princeton 'dot' edu) and Simon (juhyunp 'at' princeton 'dot' edu). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can give more effective help!

Citation

Please cite our paper if you find our paper or this repo helpful:

@misc{kaur2024instructskillmixpowerfulpipelinellm,
      title={Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning}, 
      author={Simran Kaur and Simon Park and Anirudh Goyal and Sanjeev Arora},
      year={2024},
      eprint={2408.14774},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.14774}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Quick Links

Overview

Main Results

Experiments

Prepare Conda Environment

Prepare API Keys

Generate Synthetic Data (TBD)

Train Models

Select Model Checkpoint

Evaluate Models

Links

Bugs or Questions?

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
FastChat		FastChat
MAmmoTH		MAmmoTH
WildBench		WildBench
alpaca_eval		alpaca_eval
datasets		datasets
lm-evaluation-harness		lm-evaluation-harness
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

princeton-pli/Instruct-SkillMix

Folders and files

Latest commit

History

Repository files navigation

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Quick Links

Overview

Main Results

Experiments

Prepare Conda Environment

Prepare API Keys

Generate Synthetic Data (TBD)

Train Models

Select Model Checkpoint

Evaluate Models

Links

Bugs or Questions?

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages