🖋 Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He
Large Language Models (LLMs) inherently encode extensive knowledge within their parameters. Previous studies have demonstrated that this parametric knowledge can be detected (e.g., via cloze tests) or modified (e.g., through knowledge editing).
Taking this further, can task-specific parametric knowledge be transferred across LLMs of different scales?
Absolutely! Our paper provides empirical evidence supporting the transferability of parametric knowledge.
To begin, set up your environment with the necessary packages:
conda create --name paratransfer python=3.10
conda activate paratransfer
pip install -r requirements.txt
We start by extracting task-specific parametric knowledge from the larger teacher model into the LoRA module for the smaller student model. Using Llama-2 13B as the teacher and Llama-2 7B as the student for the GSM task:
python extract_lora_with_sensitivity.py \
--model_size 13b \
--lora_size 7b \
--task gsm
python get_delta.py \
--path extracted_lora/13b-to-7b-gsm
Modify the settings in extracted_lora.sh
as needed.
Next, we use the extracted parameters to initialize the LoRA module in the student model and fine-tune it:
./train.sh
The models will be saved in the trained_lora
folder.
Merge the LoRA module with the base model for evaluation:
./merge.sh
Subsequently, employ Open-Instruct to evaluate the model across various benchmarks.
If you find this work useful, please consider citing our paper:
@article{zhong2023seeking,
title={Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective},
author={Zhong, Ming and An, Chenxin and Chen, Weizhu and Han, Jiawei and He, Pengcheng},
journal={arXiv preprint arXiv:2310.11451},
year={2023}
}