We build the SmolLM2 Instruct family by finetuning the base 1.7B on SmolTalk and the base 360M and 135M models on Smol-smoltalk using TRL
and the alignement handbook and then doing DPO on UltraFeedBack. You can find the scipts and instructions for dohere: https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm2#instructions-to-train-smollm2-17b-instruct
Here, we provide a simple script for finetuning SmolLM2. In this case, we fine-tune the base 1.7B on python data.
Install pytorch
see documentation, and then install the requirements
pip install -r requirements.txt
Before you run any of the scripts make sure you are logged in wandb
and HuggingFace Hub to push the checkpoints, and you have accelerate
configured:
wandb login
huggingface-cli login
accelerate config
Now that everything is done, you can clone the repository and get into the corresponding directory.
git clone https://github.com/huggingface/smollm
cd smollm/finetune
To fine-tune efficiently with a low cost, we use PEFT library for Low-Rank Adaptation (LoRA) training. We also use the SFTTrainer
from TRL.
For this example, we will fine-tune SmolLM1-1.7B on the Python
subset of the-stack-smol. This is just for illustration purposes.
To launch the training:
accelerate launch train.py \
--model_id "HuggingFaceTB/SmolLM2-1.7B" \
--dataset_name "bigcode/the-stack-smol" \
--subset "data/python" \
--dataset_text_field "content" \
--split "train" \
--max_seq_length 2048 \
--max_steps 5000 \
--micro_batch_size 1 \
--gradient_accumulation_steps 8 \
--learning_rate 3e-4 \
--warmup_steps 100 \
--num_proc "$(nproc)"
If you want to fine-tune on other text datasets, you need to change dataset_text_field
argument to the name of the column containing the code/text you want to train on.