Seeking Neural Nuggets: Knowledge Transfer in LLMs from a Parametric Perspective (ICLR 2024)

🖋 Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

📜 Overview

Large Language Models (LLMs) inherently encode extensive knowledge within their parameters. Previous studies have demonstrated that this parametric knowledge can be detected (e.g., via cloze tests) or modified (e.g., through knowledge editing).

Taking this further, can task-specific parametric knowledge be transferred across LLMs of different scales?

Absolutely! Our paper provides empirical evidence supporting the transferability of parametric knowledge.

🚀 Setting Up the Environment

To begin, set up your environment with the necessary packages:

conda create --name paratransfer python=3.10
conda activate paratransfer
pip install -r requirements.txt

🔄 Parametric Knowledge Transfer

Knowledge Extraction

We start by extracting task-specific parametric knowledge from the larger teacher model into the LoRA module for the smaller student model. Using Llama-2 13B as the teacher and Llama-2 7B as the student for the GSM task:

python extract_lora_with_sensitivity.py \
    --model_size 13b \
    --lora_size 7b \
    --task gsm

python get_delta.py \
  --path extracted_lora/13b-to-7b-gsm

Modify the settings in extracted_lora.sh as needed.

Knowledge Injection

Next, we use the extracted parameters to initialize the LoRA module in the student model and fine-tune it:

./train.sh

The models will be saved in the trained_lora folder.

Evaluation

Merge the LoRA module with the base model for evaluation:

./merge.sh

Subsequently, employ Open-Instruct to evaluate the model across various benchmarks.

📚 Citation

If you find this work useful, please consider citing our paper:

@article{zhong2023seeking,
  title={Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective},
  author={Zhong, Ming and An, Chenxin and Chen, Weizhu and Han, Jiawei and He, Pengcheng},
  journal={arXiv preprint arXiv:2310.11451},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data		data
docs		docs
templates		templates
utils		utils
.gitignore		.gitignore
README.md		README.md
extract_lora.sh		extract_lora.sh
extract_lora_with_sensitivity.py		extract_lora_with_sensitivity.py
finetune.py		finetune.py
get_delta.py		get_delta.py
merge.sh		merge.sh
merge_lora.py		merge_lora.py
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seeking Neural Nuggets: Knowledge Transfer in LLMs from a Parametric Perspective (ICLR 2024)

📜 Overview

🚀 Setting Up the Environment

🔄 Parametric Knowledge Transfer

Knowledge Extraction

Knowledge Injection

Evaluation

📚 Citation

About

Releases

Packages

Languages

maszhongming/ParaKnowTransfer

Folders and files

Latest commit

History

Repository files navigation

Seeking Neural Nuggets: Knowledge Transfer in LLMs from a Parametric Perspective (ICLR 2024)

📜 Overview

🚀 Setting Up the Environment

🔄 Parametric Knowledge Transfer

Knowledge Extraction

Knowledge Injection

Evaluation

📚 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages