This folder contains the InternLM2
model in transformers format and some scripts.
├── convert2hf_internlm2.py
├── convert2hf_internlm.py
├── internlm2_model
│ ├── configuration_internlm.py
│ ├── __init__.py
│ ├── modeling_internlm2.py
│ └── tokenization_internlm.py
├── internlm_model
│ ├── configuration_internlm.py
│ ├── __init__.py
│ ├── modeling_internlm.py
│ └── tokenization_internlm.py
├── README.md
├── README-zh-Hans.md
├── revert_internlm2.py
└── revert_internlm.py
InternLM`` and
InternLM2`` HuggingFace models can be adapted to different scenarios or deployments by specifying different parameters. Here are some commonly used parameters:
trust_remote_code=True
: This parameter must be specified so that HuggingFace can load the model file or tokenizer file located in the model path.torch_dtype
(Optional): Specify the data type of the loaded parameters:None
: When this parameter is not specified or set to None, the loaded model will be of type float32."auto"
: The model type will be determined by the torch_dtype field in the config.json file located in the model path.- Specific types, such as
torch.float16
,torch.bfloat16
, etc.: Load the model with the specified data type.
attn_implementation
(Optional): This parameter can be used to specify whether the model uses Flash Attention:"eager"
: If this parameter is not specified or set to"eager"
, the basic attention calculation method will be used."flash_attention_2"
: Use Flash Attention 2 to calculate attention. Make sure you have the flash_attn library in your environment, and set thetorch_dtype
field totorch.float16
ortorch.bfloat16
. Otherwise, an error will occur.
device_map
(Optional): Specifying this parameter can run the HuggingFace model on multiple GPUs. Generally, it can be set to"auto"
. Make sure you have the accelerate library installed in your environment. For more detailed settings, you can refer to the HuggingFace documentation.
Here are some examples that you can refer to:
>>> from transformers import AutoTokenizer, AutoModel
>>> import torch
# Single GPU, load with float32
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True).cuda()
# Single GPU, load with data type determined by config.json
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True, torch_dtype="auto").cuda()
# Single GPU, load with data type torch.float16
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True, torch_dtype=torch.float16).cuda()
# Single GPU, load with data type torch.float16 and use flash attention
# flash_attn library needs to be installed, and flash attention can only be used with float16 and bfloat16
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True, torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda()
# Multi-GPU load and specify dtype (accelerate library needs to be installed: pip install accelerate)
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto")
# Multi-GPU load and use flash attention
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto", attn_implementation="flash_attention_2")
convert2hf_internlm.py
can convert saved training InternLM weights into the transformers format with a single command. Below are the parameters needed:
--src
: Path to the weights to be converted.--tgt
: Path to save the converted HuggingFace weights.--tokenizer
: Path to the tokenizer.--dtype
(Optional): The dtype to save the converted weights; defaults tobfloat16
.--max_shard
(Optional): The maximum size of the sharded weights, equivalent to themax_shard_size
parameter of thesave_pretrained
function. Defaults to10GB
.--max_pos
(Optional): The maximum context size of the model, generally equal to the maximum sequence length during training. Defaults to4096
.--rotary_type
(Optional): The type of positional encoding; supports two options:origin
for rotary positional encoding, anddynamic
for dynamic NTK rotary encoding. Defaults toorigin
.--scaling_factor
(Optional): The scaling factor for dynamic NTK rotary encoding; this parameter is only relevant when--rotary_type=origin
. Defaults to1.0
.
Execute the command in the root directory of repository:
python transformers/convert2hf_internlm.py --src origin_ckpt/ --tgt hf_ckpt/ --tokenizer ./tools/tokenizer_internlm2.model --max_pos 4096 --rotary_type origin
# dynamic NTK
python transformers/convert2hf_internlm.py --src origin_ckpt/ --tgt hf_ckpt/ --tokenizer ./tools/tokenizer_internlm2.model --max_pos 4096 --rotary_type dynamic --scaling_factor 2.0
Then, you can load it using the from_pretrained
interface:
>>> from transformers import AutoTokenizer, AutoModel
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True).cuda()
revert_internlm.py
can convert huggingface-format checkpoint to training InternLM weights. Below are the parameters needed"
--src
: Path to the HuggingFace weights to be converted.--tgt
: Path to save the converted weights.--tp_size
: Tensor parallel size for the converted weights.--version
: The correspondence betweendown_proj
up_proj
in MLP layers andw2
w3
inInternLM
. Set to1
if HuggingFace'sdown_proj
corresponds toInternLM
'sw3
andup_proj
corresponds toInternLM
'sw2
, and set to2
for the opposite.--embed_split
: Theembed_split_hidden
parameter of theInternEvo
framework. If specified, the embedding layer will be split along the hidden states dimension; otherwise, it will be split along another dimension.--use_flash
: Theuse_flash_attn
parameter ofInternEvo
. If specified, Flash Attention will be used after loading.--safetensors
: Indicates whether the HuggingFace model to is saved withsafetensors
. If specified, it means the model is saved withsafetensors
.
Execute the command below:
python transformers/revert_internlm.py --src /path/to/src --tgt /path/to/tgt --tp_size 2 --embed_split --use_flash --version 1
If the model is saved with safetensors
, please add --safetensors
to the command:
python transformers/revert_internlm.py --src /path/to/src --tgt /path/to/tgt --tp_size 2 --embed_split --use_flash --version 1 --safetensors
convert2hf_internlm2.py
can convert saved training InternLM2 weights into the transformers format with a single command. Below are the parameters needed:
--src
: Path to the weights to be converted.--tgt
: Path to save the converted HuggingFace weights.--tokenizer
: Path to the tokenizer.--dtype
(Optional): The dtype to save the converted weights; defaults tobfloat16
.--max_shard
(Optional): The maximum size of the sharded weights, equivalent to themax_shard_size
parameter of thesave_pretrained
function. Defaults to10GB
.--max_pos
(Optional): The maximum context size of the model, generally equal to the maximum sequence length during training. Defaults to4096
.--rotary_type
(Optional): The type of positional encoding; supports two options:origin
for rotary positional encoding, anddynamic
for dynamic NTK rotary encoding. Defaults toorigin
.--scaling_factor
(Optional): The scaling factor for dynamic NTK rotary encoding; this parameter is only relevant when--rotary_type=origin
. Defaults to2.0
.
Execute the command in the root directory of repository:
python transformers/convert2hf_internlm2.py --src origin_ckpt/ --tgt hf_ckpt/ --tokenizer ./tools/tokenizer_internlm2.model --max_pos 32768 --rotary_type origin
# dynamic NTK
python transformers/convert2hf_internlm2.py --src origin_ckpt/ --tgt hf_ckpt/ --tokenizer ./tools/tokenizer_internlm2.model --max_pos 32768 --rotary_type dynamic --scaling_factor 2.0
Then, you can load it using the from_pretrained
interface:
>>> from transformers import AutoTokenizer, AutoModel
>>> model = AutoModel.from_pretrained("hf_ckpt/", trust_remote_code=True).cuda()
revert_internlm2.py
can convert huggingface-format checkpoint to training InternLM2 weights. Below are the parameters needed:
--src
: Path to the HuggingFace weights to be converted.--tgt
: Path to save the converted weights.--tp_size
: Tensor parallel size for the converted weights.--embed_split
: Theembed_split_hidden
parameter of theInternEvo
framework. If specified, the embedding layer will be split along the hidden states dimension; otherwise, it will be split along another dimension.--use_flash
: Theuse_flash_attn
parameter ofInternEvo
. If specified, Flash Attention will be used after loading.--safetensors
: Indicates whether the HuggingFace model to is saved withsafetensors
. If specified, it means the model is saved withsafetensors
.
Execute the command below:
python transformers/revert_internlm2.py --src /path/to/src --tgt /path/to/tgt --tp_size 2 --embed_split --use_flash
If the model is saved with safetensors
, please add --safetensors
to the command:
python transformers/revert_internlm2.py --src /path/to/src --tgt /path/to/tgt --tp_size 2 --embed_split --use_flash --safetensors