Expert-Specialized Fine-Tuning

Official Repo for paper Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models by Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li and Y. Wu.

ESFT aims to efficiently customize Large Language Models (LLMs) with Mixture-of-Experts (MoE) architecture by adjusting only task-relevant parts, improving efficiency and performance while using fewer resources and storage.

📰 News

📅 2024.9.20: Glad to announce that ESFT has been accepted to the EMNLP 2024 Main Conference!

📅 2024.8.11: We now release the ESFT training code! ✨ You can now try it with your own models and dataset!

🚀 Quick Start

Installation and Setup

git clone https://github.com/deepseek-ai/ESFT.git
cd esft

Install required dependencies

pip install transformers torch safetensors accelerate

Download necessary adapters

bash scripts/download_adapters.sh

🔧Key Scripts

eval_multigpu.py This script evaluates the performance of the model on various datasets. See scripts/eval.sh for detailed configs and explanations.

Usage:

python eval_multigpu.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --adapter_dir=all_models/adapters/token/translation \
    --output_path=results/completions/token/translation.jsonl \
    --openai_api_key=YOUR_OPENAI_API_KEY

get_expert_scores.py This script calculates the scores for each expert based on the evaluation datasets. Usage:

python scripts/expert/get_expert_scores.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --output_dir=results/expert_scores/translation \
    --n_sample_tokens=131072 \
    --world_size=4 \
    --gpus_per_rank=2

generate_expert_config.py This script generates the configuration to convert a MoE model with only task-relevant tasks trained based on evaluation scores. Usage:

python scripts/expert/generate_expert_config.py \
    --eval_datasets=intent,summary,law,translation \
    --expert_scores_dir=results/expert_scores \
    --output_dir=results/expert_configs \
    --score_function=token \
    --top_p=0.2 # the scoring function and top_p are hyperparameters

train.py and train_ep.py This script trains the model with the expert configuration generated by the previous script. The train_ep.py file uses expert parallel and has been optimized for multi-GPU training. Usage:

python train.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/intent.json \
    --train_dataset=intent \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/intent
    
torchrun --nproc-per-node=8 train_ep.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/translation.json \
    --train_dataset=translation \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/translation

Contact and Support

For bug reports, feature requests, and general inquiries, please open an issue on our GitHub Issues page. Make sure to include as much detail as possible to help us address your issue quickly.

🌟Todo list

☑️ 📝 Update models, evaluation scripts, and expert selection scripts
☑️ 🔧 Update training scripts
🔲 🚀 More...

📚Citation

If you find our code or paper useful, please cite:

@article{wang2024letexpertsticklast,
      title={Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models}, 
      author={Zihan Wang and Deli Chen and Damai Dai and Runxin Xu and Zhuoshu Li and Y. Wu},
      year={2024},
      eprint={2407.01906},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.01906}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
datasets		datasets
deepseek		deepseek
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE-CODE		LICENSE-CODE
LICENSE-MODEL		LICENSE-MODEL
README.md		README.md
__init__.py		__init__.py
benchmarks.py		benchmarks.py
esft.py		esft.py
eval_multigpu.py		eval_multigpu.py
train.py		train.py
train_ep.py		train_ep.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Expert-Specialized Fine-Tuning

📰 News

🚀 Quick Start

Installation and Setup

Install required dependencies

Download necessary adapters

🔧Key Scripts

Contact and Support

🌟Todo list

📚Citation

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

deepseek-ai/ESFT

Folders and files

Latest commit

History

Repository files navigation

Expert-Specialized Fine-Tuning

📰 News

🚀 Quick Start

Installation and Setup

Install required dependencies

Download necessary adapters

🔧Key Scripts

Contact and Support

🌟Todo list

📚Citation

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages