We fine-tune models using the evaluation setup described in LLM Adapters. This involves jointly fine-tuning on 8 commonsense reasoning datasets with a training set of size 170k. We follow the evaluation setup of the official code release of LLM Adapters, with the exception that we use log-likelihood rather than regex parsing to determine the model's output.
To ensure consistency of evaluations with LLM Adapters, we use helper functions defined in their code. To set up for evaluations, run the following command:
# Change this to the path to CoreNet
cd /path/to/corenet
# Install LM Harness.
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
git checkout 3196e907fa195b684470a913c7235ed7f08a4383
pip install -e .
cd ..
# Install LLM Adapters.
git clone https://github.com/AGI-Edgerunners/LLM-Adapters.git
cd LLM-Adapters
git checkout 816657208af4db747803f87ba40a4c71383fed7a
touch __init__.py
pip install -r requirements.txt -c ../internal/constraints.txt
cd ..
# Install Huggingface and its dependencies.
python3 -m pip install --upgrade transformers==4.36.2
python3 -m pip install --upgrade datasets==2.19.0
python3 -m pip install --upgrade accelerate==0.29.3
python3 -m pip install --upgrade sentencepiece==0.2.0
In our experiments, we used LLamav1/v2 tokenizer. Please download the tokenizer from the official repository.
To fine-tune a 270M-parameter model with LoRA, use the following command:
CFG_FILE="projects/openelm/peft_configs/openelm_lora_270M.yaml"
WTS_FILE="https://docs-assets.developer.apple.com/ml-research/models/corenet/v0.1.0/openelm/pretrained/270M/checkpoint_average.pt"
TOKENIZER_FILE="<PATH_TO_TOKENIZER_FILE>"
# NOTE: The dataset can currently be obtained from https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/ft-training_set/commonsense_170k.json.
DATASET_FILE="<PATH_TO_COMMONSENSE_170K>"
corenet-train --common.config-file $CFG_FILE \
--model.language-modeling.pretrained $WTS_FILE \
--text-tokenizer.sentence-piece.model-path $TOKENIZER_FILE \
--dataset.language-modeling.commonsense-170k.path $DATASET_FILE
To train with DoRA instead, edit the config file to set use_dora
to True.
To evaluate a pre-trained LoRA 270M model, use the following command:
CFG_FILE="projects/openelm/peft_configs/openelm_lora_270M_eval.yaml"
WTS_FILE="https://docs-assets.developer.apple.com/ml-research/models/corenet/v0.1.0/openelm/peft/openelm_lora_270M.pt"
TOKENIZER_FILE="<PATH_TO_TOKENIZER_FILE>"
corenet-eval-llmadapters --common.config-file $CFG_FILE \
--model.language-modeling.pretrained $WTS_FILE \
--text-tokenizer.sentence-piece.model-path $TOKENIZER_FILE
The expected results are:
boolq | piqa | siqa | hellaswag | winogrande | arc-easy | arc-challenge | obqa |
---|---|---|---|---|---|---|---|
62.14 | 50.05 | 42.02 | 24.84 | 49.88 | 26.60 | 24.57 | 28.00 |
To evaluate other pretrained models, edit the config file to use different backbones. To evaluate DoRA models, edit the config file to set use_dora
to True.
Model | LoRA/DoRA | Weights |
---|---|---|
OpenELM-270M | LoRA | Link |
OpenELM-450M | LoRA | Link |
OpenELM-1.1B | LoRA | Link |
OpenELM-3B | LoRA | Link |
OpenELM-270M | DoRA | Link |
OpenELM-450M | DoRA | Link |
OpenELM-1.1B | DoRA | Link |
OpenELM-3B | DoRA | Link |