ModelScope Community Website
中文 | English
SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by PEFT, we also provide a complete Adapters library to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.
To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners.
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
- 🔥2024.03.20: Supports inference and fine-tuning for the llava series. For best practice, you can refer to here.
- 🔥2024.03.12: Support inference and fine-tuning for deepseek-vl series. Best practices can be found here.
- 🔥2024.03.11: Support GaLore for effectively reducing memory usage to 1/2 of the original in full-parameter training.
- 🔥2024.03.10: End-to-end best practices from fine-tuning to deployment for Qwen1.5-7B-Chat and Qwen1.5-72B-Chat.
- 🔥2024.03.09: Support training and inference of MAMBA model, use this script to start training!
- 2024.03.09: Support training and inference of AQLM quantized model, use this script to start training!
- 2024.03.06: Support training and inference of AWQ quantized model, use this Qwen1.5-AWQ model script to start training, and support training and inference of yi-9b.
- 🔥2024.02.29: Support LLaMA PRO, simply use this script to start training.
- 🔥2024.02.29: Support LoRA+, simply use this script to start training.
- 2024.02.25: Support
swift export
to quantize models using AWQ/GPTQ and push to ModelScope Hub. See documentation: LLM Quantization.
More
- 2024.02.22: Support gemma series: gemma-2b, gemma-2b-instruct, gemma-7b, gemma-7b-instruct.
- 2024.02.16: Support deepseek-math series: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
- 🔥2024.02.05: Support Qwen1.5 series models, see model list for all supported Qwen1.5 models. Provide fine-tuning scripts for qwen1half-7b-chat, qwen1half-7b-chat-int8.
- 2024.02.05: Support training of diffusion models such as SDXL, SD, ControlNet, as well as DreamBooth training. See corresponding training scripts for details.
- 2024.02.01: Support minicpm series: minicpm-2b-sft-chat, minicpm-2b-chat.
- 🔥2024.02.01: Support dataset mixing to reduce catastrophic forgetting. Use
--train_dataset_mix_ratio 2.0
to enable training! We also open sourced the general knowledge dataset ms-bench. - 🔥2024.02.01: Support Agent training! Agent training algorithm is derived from this paper. We also added ms-agent, a high-quality agent dataset. Use this script to start Agent training!
- 🔥2024.02.01: Support adding SFT loss in DPO training to reduce repetitive generation caused by KL divergence loss.
- 2024.02.01: Support using AdaLoRA and IA3 adapters in training.
- 2024.02.01: Support
--merge_lora
parameter in AnimateDiff training. - 2024.01.30: Support internlm-xcomposer2-7b-chat.
- 🔥2024.01.30: Support ZeRO-3, simply specify
--deepspeed default-zero3
. - 2024.01.29: Support internlm2-math series: internlm2-math-7b, internlm2-math-7b-chat, internlm2-math-20b, internlm2-math-20b-chat.
- 🔥2024.01.26: Support yi-vl-6b-chat, yi-vl-34b-chat.
- 2024.01.24: Support codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
- 2024.01.23: Support orion series: orion-14b, orion-14b-chat.
- 2024.01.20: Support xverse-13b-256k, xverse-65b-v2, xverse-65b-chat.
- 🔥2024.01.17: Support internlm2 series: internlm2-7b-base, internlm2-7b, internlm2-7b-sft-chat, internlm2-7b-chat, internlm2-20b-base, internlm2-20b, internlm2-20b-sft-chat, internlm2-20b-chat.
- 2024.01.15: Support yuan series: yuan2-2b-instruct, yuan2-2b-janus-instruct, yuan2-51b-instruct, yuan2-102b-instruct.
- 🔥2024.01.12: Support deepseek-moe series: deepseek-moe-16b, deepseek-moe-16b-chat.
- 🔥2024.01.04: Support VLLM deployment, compatible with OpenAI API style, see VLLM Inference Acceleration and Deployment for details.
- 2024.01.04: Update Benchmark for convenient viewing of training speed and memory usage of different models.
- 🔥2023.12.29: Support web-ui for sft training and inference, use
swift web-ui
after installing ms-swift to start. - 🔥2023.12.29: Support DPO RLHF (Reinforcement Learning from Human Feedback) and three datasets for this task: AI-ModelScope/stack-exchange-paired, AI-ModelScope/hh-rlhf and AI-ModelScope/hh_rlhf_cn. See documentation to start training!
- 🔥2023.12.28: Support SCEdit! This tuner can significantly reduce memory usage in U-Net and support low-memory controllable image generation (replacing ControlNet), read the section below to learn more.
- 2023.12.23: Support codegeex2-6b.
- 2023.12.19: Support phi2-3b.
- 2023.12.18: Support VLLM for inference acceleration.
- 2023.12.15: Support deepseek, deepseek-coder series: deepseek-7b, deepseek-7b-chat, deepseek-67b, deepseek-67b-chat, openbuddy-deepseek-67b-chat, deepseek-coder-1_3b, deepseek-coder-1_3b-instruct, deepseek-coder-6_7b, deepseek-coder-6_7b-instruct, deepseek-coder-33b, deepseek-coder-33b-instruct.
- 2023.12.13: Support mistral-7b-instruct-v2, mixtral-moe-7b, mixtral-moe-7b-instruct.
- 2023.12.09: Support
freeze_parameters
parameter as a compromise between lora and full-parameter training. Corresponding sh can be found in full_freeze_ddp. Supportdisable_tqdm
,lazy_tokenize
,preprocess_num_proc
parameters, see command line arguments for details. - 2023.12.08: Support sus-34b-chat, support yi-6b-200k, yi-34b-200k.
- 2023.12.07: Support Multi-Node DDP training.
- 2023.12.05: Support models: zephyr-7b-beta-chat, openbuddy-zephyr-7b-chat. Support datasets: hc3-zh, hc3-en.
- 🔥2023.12.02: Self-cognition fine-tuning best practices, 10 minutes to fine-tune a large model for self-cognition, create your own unique large model.
- 🔥2023.11.30: Support training and inference of qwen-1_8b, qwen-72b, qwen-audio series models. Corresponding sh scripts can be found in qwen_1_8b_chat, qwen_72b_chat, qwen_audio_chat
- 🔥2023.11.29: Support training and inference of AnimateDiff
- 🔥2023.11.24: Support yi-34b-chat, codefuse-codellama-34b-chat models. Corresponding sh scripts can be found in yi_34b_chat, codefuse_codellama_34b_chat.
- 🔥2023.11.18: Support tongyi-finance-14b series models: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. Corresponding sh scripts can be found in tongyi_finance_14b_chat_int4.
- 2023.11.16: Support flash attn for more models: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series. Please use
use_flash_attn
parameter. - 🔥2023.11.11: Support NEFTune, simply use
Swift.prepare_model(model, NEFTuneConfig())
to enable. - 🔥2023.11.11: Support training and inference by command line and inference by Web-UI, see
Usage with Swift CLI
section below for details. - 🔥2023.11.10: Support bluelm series models: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. Corresponding sh scripts can be found in bluelm_7b_chat.
- 🔥2023.11.08: Support training and inference of xverse-65b model, script at xverse_65b.
- 🔥2023.11.07: Support training and inference of yi-6b, yi-34b models, scripts at yi_6b, yi_34b.
- 🔥2023.10.30: Support two new tuners: QA-LoRA and LongLoRA.
- 🔥2023.10.30: Support editing models using ROME (Rank One Model Editing) to infuse new knowledge into models without training!
- 2023.10.30: Support skywork-13b series models: skywork-13b, skywork-13b-chat. Corresponding sh scripts can be found in skywork_13b.
- 🔥2023.10.27: Support chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. Corresponding sh scripts can be found in chatglm3_6b.
- 🔥2023.10.17: Support SFT of int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8.
- 2023.10.15: Support ziya2-13b series models: ziya2-13b, ziya2-13b-chat.
- 2023.10.12: Support mistral-7b series models: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-instruct.
- 🔥2023.10.07: Support DeepSpeed ZeRO-2, enabling lora (not just qlora) to run DDP on dual A10 cards.
- 2023.10.04: Support more math, law, SQL, code domain datasets: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.
- 🔥2023.09.25: Support qwen-14b series: qwen-14b, qwen-14b-chat.
- 2023.09.18: Support internlm-20b series: internlm-20b, internlm-20b-chat.
- 2023.09.12: Support MP+DDP to accelerate full-parameter training.
- 2023.09.05: Support openbuddy-llama2-70b-chat.
- 2023.09.03: Support baichuan2 series: baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat.
SWIFT runs in the Python environment. Please ensure your Python version is higher than 3.8.
- Method 1: Install SWIFT using pip command:
# Full capabilities
pip install ms-swift[all] -U
# LLM only
pip install ms-swift[llm] -U
# AIGC only
pip install ms-swift[aigc] -U
# Adapters only
pip install ms-swift -U
- Method 2: Install SWIFT through source code (convenient for running training and inference scripts), please run the following commands:
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
SWIFT depends on torch>=1.13, recommend torch>=2.0.0.
- Method 3: Use SWIFT in our Docker image
# China-Hangzhou image
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1
# US-west image
docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1
This section introduces basic usage, see the Documentation section for more ways to use.
swift web-ui
You can refer to the following scripts to customize your own training script.
- full: qwen1half-7b-chat (A100), qwen-7b-chat (2*A100)
- full+ddp+zero2: qwen-7b-chat (4*A100)
- full+ddp+zero3: qwen-14b-chat (4*A100)
- lora: chatglm3-6b (3090), baichuan2-13b-chat (2*3090), yi-34b-chat (A100), qwen-72b-chat (2*A100)
- lora+ddp: chatglm3-6b (2*3090)
- lora+ddp+zero3: qwen-14b-chat (4*3090), qwen-72b-chat (4*A100)
- qlora(gptq-int4): qwen-7b-chat-int4 (3090)
- qlora(gptq-int8): qwen1half-7b-chat-int8 (3090)
- qlora(bnb-int4): qwen-7b-chat (3090)
Training Process | Training Method |
---|---|
Pretraining | Text Generation |
Fine-tuning | Single-turn/Multi-turn/Agent Training/Self-cognition/Multi-modal QA/Speech QA |
Human Alignment | DPO |
Text-to-Image | DreamBooth, etc. |
Text-to-Video | - |
Start single GPU fine-tuning with the following command:
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset ms-bench-mini \
--train_dataset_sample 1000 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 5e-5 \
--warmup_ratio 0.4 \
--output_dir output \
--lora_target_modules ALL \
--self_cognition_sample 500 \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope
Model parallel training modifies the CUDA_VISIBLE_DEVICES
environment variable based on the above command:
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset ms-bench-mini \
--train_dataset_sample 1000 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 5e-5 \
--warmup_ratio 0.4 \
--output_dir output \
--lora_target_modules ALL \
--self_cognition_sample 500 \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope
Data parallel training modifies the NPROC_PER_NODE
environment variable based on the above command:
# If the number of CUDA_VISIBLE_DEVICES is an integer multiple of NPROC_PER_NODE (greater than 1), data parallel is launched according to NPROC_PER_NODE, and model parallel is launched according to CUDA_VISIBLE_DEVICES number/NPROC_PER_NODE
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset ms-bench-mini \
--train_dataset_sample 1000 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 5e-5 \
--warmup_ratio 0.4 \
--output_dir output \
--lora_target_modules ALL \
--self_cognition_sample 500 \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset ms-bench-mini \
--train_dataset_sample 1000 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 5e-5 \
--warmup_ratio 0.4 \
--output_dir output \
--lora_target_modules ALL \
--self_cognition_sample 500 \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
--deepspeed default-zero3
swift infer --model_type qwen1half-7b-chat --stream true
swift infer --model_type qwen1half-7b-chat --infer_backend vllm --stream true
# Debugging, on line soon:>
swift eval --model_type qwen1half-7b-chat --eval_dataset mmlu ceval
swift export --model_type qwen1half-7b-chat --quant_bits 4 --quant_method awq
swift deploy --model_type qwen1half-7b-chat --infer_backend vllm --max_model_len 8192
Model Type | Model Introduction | Language | Model Size | Model Type |
---|---|---|---|---|
Qwen/Qwen1.5 | Tongyi Qwen 1.0 and 1.5 series models | Chinese/English | 1.8B-72B, including quantized versions | base model/chat model |
ChatGLM2/ChatGLM3/Codegeex2 | Zhipu ChatGLM series models | Chinese/English | 6B | base model/chat model |
Baichuan/Baichuan2 | Baichuan 1 and Baichuan 2 | Chinese/English | 7B-13B | base model/chat model |
Yuan2 | Langchao Yuan series models | Chinese/English | 2B-102B | chat model |
XVerse | XVerse series models | Chinese/English | 7B-65B | base model/chat model |
LLaMA2 | LLaMA2 series models | English | 7B-70B, including quantized versions | base model/chat model |
Mistral/Mistral-MoE | Mistral series models | English | 7B, including quantized and MoE versions | base model/chat model |
YI | 01AI's YI series models | Chinese/English | 6B-34B | base model/chat model |
InternLM/InternLM2/InternLM2-Math | Pujiang AI Lab InternLM series models | Chinese/English | 1.8B-20B | base model/chat model/math model |
DeepSeek/DeepSeek-Coder/DeepSeek-Math | DeepSeek series models | Chinese/English | 1.3B-67B | base model/chat model/code generation model/math model |
MAMBA | MAMBA temporal convolution model | English | 130M-2.8B | base model |
Gemma | Google Gemma series models | English | 2B-7B | base model/chat model |
MiniCPM | OpenBmB MiniCPM series models | Chinese/English | 2B-3B | chat model |
OpenBuddy | OpenBuddy series models | Chinese/English | 7B-67B | base model/chat model |
Orion | OrionStar AI series models | Chinese/English | 14B | base model/chat model |
BlueLM | VIVO BlueLM large model | Chinese/English | 7B | base model/chat model |
Ziya2 | Fengshenbang series models | Chinese/English | 13B | base model/chat model |
Skywork | Skywork series models | Chinese/English | 13B | base model/chat model |
Zephyr | Zephyr series models based on Mistral | English | 7B | chat model |
PolyLM | Tongyi Lab self-developed PolyLM series models | Multilingual | 13B | base model |
SeqGPT | Tongyi Lab self-developed text understanding model for information extraction and text classification | Chinese | 560M | semantic understanding model |
SUS | Southern University of Science and Technology model fine-tuned on YI | Chinese/English | 34B | chat model |
Tongyi-Finance | Tongyi finance series models | Chinese/English | 13B | finance domain base model/chat model |
CodeFuse-CodeLLaMA/CodeFuse-Codegeex2/CodeFuse-Qwen | Ant CodeFuse series models | Chinese/English | 6B-34B | code generation model |
phi2 | Microsoft's PHI2 model | English | 3B | generation model |
Model Type | Model Introduction | Language | Model Size | Model Type |
---|---|---|---|---|
Qwen-VL | Tongyi Qwen vision model | Chinese/English | 7B, including quantized versions | base model/chat model |
Qwen-Audio | Tongyi Qwen speech model | Chinese/English | 7B | base model/chat model |
YI-VL | 01AI's YI series vision models | Chinese/English | 6B-34B | chat model |
xcomposer2 | Pujiang AI Lab InternLM vision model | Chinese/English | 7B | chat model |
DeepSeek-VL | DeepSeek series vision models | Chinese/English | 1.3B-7B | chat model |
MiniCPM-VL | OpenBmB MiniCPM vision model | Chinese/English | 3B | chat model |
CogAgent/CogVLM | Zhipu ChatGLM visual QA and Agent model | Chinese/English | 17B-18B | chat model |
Model Type | Model Introduction | Language | Model Type |
---|---|---|---|
AnimateDiff | AnimateDiff animation model | English | text-to-video |
SD1.5/SD2.0/SDXL | StabilityAI series diffusion models | English | text-to-image |
Dataset Type | Training Task | Documentation |
---|---|---|
General | Fine-tuning | 🔥ms-bench, 🔥ms-bench-mini, 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca-all, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, instruct-en, gpt4all-en, sharegpt-en, sharegpt-zh, tulu-v2-sft-mixture, wikipedia-zh, open-orca, open-orca-gpt4, sharegpt-gpt4, 🔥sharegpt-gpt4-mini. |
Agent | Fine-tuning | 🔥ms-agent, damo-mini-agent-zh, damo-agent-zh, agent-instruct-all-en. |
General | Human Alignment | 🔥hh-rlhf-cn, stack-exchange-paired, hh-rlhf-harmless-base, hh-rlhf-helpful-base, hh-rlhf-helpful-online, hh-rlhf-helpful-rejection-sampled, hh-rlhf-red-team-attempts, hh-rlhf-cn-harmless-base-cn, hh-rlhf-cn-helpful-base-cn, hh-rlhf-cn-harmless-base-en, hh-rlhf-cn-helpful-base-en. |
Code | Fine-tuning | code-alpaca-en, 🔥leetcode-python-en, 🔥codefuse-python-en, 🔥codefuse-evol-instruction-zh. |
Medical | Fine-tuning | medical-en, medical-zh, medical-mini-zh, 🔥disc-med-sft-zh. |
Legal | Fine-tuning | lawyer-llama-zh, tigerbot-law-zh, 🔥disc-law-sft-zh. |
Math | Fine-tuning | 🔥blossom-math-zh, school-math-zh, open-platypus-en. |
SQL | Fine-tuning | text2sql-en, 🔥sql-create-context-en. |
Text Generation | Fine-tuning | 🔥advertise-gen-zh, 🔥dureader-robust-zh. |
Classification | Fine-tuning | cmnli-zh, 🔥cmnli-mini-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en. |
Quantization Assist | Quantization | pileval. |
Other | Fine-tuning | finance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh. |
Vision | Fine-tuning | coco-en, 🔥coco-mini-en, coco-mini-en-2, capcha-images. |
Audio | Fine-tuning | aishell1-zh, 🔥aishell1-mini-zh. |
Technology Name |
---|
🔥LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS |
🔥LoRA+: LoRA+: Efficient Low Rank Adaptation of Large Models |
🔥LLaMA PRO: LLAMA PRO: Progressive LLaMA with Block Expansion |
🔥SCEdit: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing < arXiv | Project Page > |
🔥NEFTune: Noisy Embeddings Improve Instruction Finetuning |
QA-LoRA:Quantization-Aware Low-Rank Adaptation of Large Language Models |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models |
ROME: Rank-One Editing of Encoder-Decoder Models |
Adapter: Parameter-Efficient Transfer Learning for NLP |
Prompt Tuning: Visual Prompt Tuning |
Side: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks |
Res-Tuning: Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone < arXiv | Project Page | Usage > |
Tuners provided by PEFT, such as IA3, AdaLoRA, etc. |
Hardware Environment | Notes |
---|---|
CPU | |
RTX 20/30/40 series, etc. | After 30 series, BF16 and FlashAttn can be used |
Computing cards T4/V100, etc. | BF16 and FlashAttn not supported |
Computing cards A10/A100, etc. | Support BF16 and FlashAttn |
Huawei Ascend NPU |
make docs
# Check docs/build/html/index.html in web-browser
-
ModelScope Library The ModelScope library is the model library of the ModelScope project, containing popular deep learning models for various modalities.
This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.
@Misc{swift,
title = {SWIFT:Scalable lightWeight Infrastructure for Fine-Tuning},
author = {The ModelScope Team},
howpublished = {\url{https://github.com/modelscope/swift}},
year = {2024}
}
You can contact us and communicate with us by adding our WeChat group: