Skip to content

Commit

Permalink
use model.generation_config (modelscope#1850)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang authored Aug 31, 2024
1 parent 38b114a commit 018bd8d
Show file tree
Hide file tree
Showing 12 changed files with 86 additions and 80 deletions.
20 changes: 10 additions & 10 deletions docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,11 @@
- `--save_safetensors`: 默认为`True`.
- `--include_num_input_tokens_seen`: 默认为`False`. 跟踪整个训练过程中观察到的输入tokens的数量.
- `--max_new_tokens`: 默认为`2048`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--do_sample`: 默认为`True`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--temperature`: 默认为`0.3`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_k`: 默认为`20`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_p`: 默认为`0.7`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--repetition_penalty`: 默认为`1.`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--do_sample`: 参考文档: [https://huggingface.co/docs/transformers/main_classes/text_generation](https://huggingface.co/docs/transformers/main_classes/text_generation). 默认为`None`, 继承模型的generation_config. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--temperature`: 默认为`None`, 继承模型的generation_config. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_k`: 默认为`None`, 继承模型的generation_config. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_p`: 默认为`None`, 继承模型的generation_config. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--repetition_penalty`: 默认为`None`, 继承模型的generation_config. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--num_beams`: 默认为`1`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--gpu_memory_fraction`: 默认为`None`. 该参数旨在指定显卡最大可用显存比例的情况下运行训练,用于极限测试.
- `--train_dataset_mix_ratio`: 默认为`0.`. 该参数定义了如何进行数据集打混训练. 指定该参数时, 会混合训练集的`train_dataset_mix_ratio`倍数的`train_dataset_mix_ds`指定的通用知识数据集. 该参数已废弃, 请使用`--dataset`进行数据集混合.
Expand Down Expand Up @@ -327,11 +327,11 @@ RLHF参数继承了sft参数, 除此之外增加了以下参数:
- `--bnb_4bit_use_double_quant`: 默认值为`True`. 具体的参数介绍可以在`sft命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
- `--bnb_4bit_quant_storage`: 默认值为`True`. 具体的参数介绍可以在`sft命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
- `--max_new_tokens`: 生成新token的最大数量, 默认值为`2048`.
- `--do_sample`: 是使用贪婪生成的方式还是采样生成的方式, 默认值为`True`.
- `--temperature`: 默认值为`0.3`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
- `--top_k`: 默认值为`20`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
- `--top_p`: 默认值为`0.7`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
- `--repetition_penalty`: 默认值为`1.`. 该参数会在部署参数中作为默认值使用.
- `--do_sample`: 参考文档: [https://huggingface.co/docs/transformers/main_classes/text_generation](https://huggingface.co/docs/transformers/main_classes/text_generation). 默认值为`None`, 继承模型的generation_config.
- `--temperature`: 默认值为`None`, 继承模型的generation_config. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
- `--top_k`: 默认值为`None`, 继承模型的generation_config. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
- `--top_p`: 默认值为`None`, 继承模型的generation_config. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
- `--repetition_penalty`: 默认值为`None`, 继承模型的generation_config. 该参数会在部署参数中作为默认值使用.
- `--num_beams`: 默认为`1`.
- `--use_flash_attn`: 默认值为`None`, 即为'auto'. 具体的参数介绍可以在`sft命令行参数`中查看.
- `--ignore_args_error`: 默认值为`False`, 具体的参数介绍可以在`sft命令行参数`中查看.
Expand Down
20 changes: 10 additions & 10 deletions docs/source_en/LLM/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,11 @@
- `--save_safetensors`: Default is `True`.
- `--include_num_input_tokens_seen`: Default is `False`. Tracks the number of input tokens seen throughout training.
- `--max_new_tokens`: Default is `2048`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--do_sample`: Default is `True`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--temperature`: Default is `0.3`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_k`: Default is `20`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_p`: Default is `0.7`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--repetition_penalty`: Default is `1.`. This parameter will be used as default value in deployment parameters.
- `--do_sample`: Reference document: [https://huggingface.co/docs/transformers/main_classes/text_generation](https://huggingface.co/docs/transformers/main_classes/text_generation). Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `predict_with_generate` is set to True.
- `--temperature`: Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_k`: Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_p`: Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--repetition_penalty`: Default is `None`, inheriting the model's generation_config. This parameter will be used as default value in deployment parameters.
- `--num_beams`: Default is `1`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--gpu_memory_fraction`: Default is `None`. This parameter aims to run training under a specified maximum available GPU memory percentage, used for extreme testing.
- `--train_dataset_mix_ratio`: Default is `0.`. This parameter defines how to mix datasets for training. When this parameter is specified, it will mix the training dataset with a multiple of `train_dataset_mix_ratio` of the general knowledge dataset specified by `train_dataset_mix_ds`. This parameter has been deprecated, please use `--dataset {dataset_name}#{dataset_sample}` to mix datasets.
Expand Down Expand Up @@ -329,11 +329,11 @@ RLHF parameters are an extension of the sft parameters, with the addition of the
- `--bnb_4bit_use_double_quant`: Default is `True`. See `sft command line arguments` for parameter details. If `quantization_bit` is set to 0, this parameter has no effect.
- `--bnb_4bit_quant_storage`: Default value `None`.See `sft command line arguments` for parameter details. If `quantization_bit` is set to 0, this parameter has no effect.
- `--max_new_tokens`: Maximum number of new tokens to generate, default is `2048`.
- `--do_sample`: Whether to use greedy generation or sampling generation, default is `True`.
- `--temperature`: Default is `0.3`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_k`: Default is `20`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_p`: Default is `0.7`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--repetition_penalty`: Default is `1.`. This parameter will be used as default value in deployment parameters.
- `--do_sample`: Reference document: [https://huggingface.co/docs/transformers/main_classes/text_generation](https://huggingface.co/docs/transformers/main_classes/text_generation). Default is `None`, inheriting the model's generation_config.
- `--temperature`: Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_k`: Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--top_p`: Default is `None`, inheriting the model's generation_config. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
- `--repetition_penalty`: Default is `None`, inheriting the model's generation_config. This parameter will be used as default value in deployment parameters.
- `--num_beams`: Default is `1`.
- `--use_flash_attn`: Default is `None`, i.e. 'auto'. See `sft command line arguments` for parameter details.
- `--ignore_args_error`: Default is `False`, see `sft command line arguments` for parameter details.
Expand Down
2 changes: 1 addition & 1 deletion swift/llm/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,8 +203,8 @@ def prepare_model_template(args: InferArguments,
num_beams=args.num_beams,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id)
logger.info(f'generation_config: {generation_config}')
set_generation_config(model, generation_config)
logger.info(f'model.generation_config: {model.generation_config}')

if model.max_model_len is None:
model.max_model_len = args.max_model_len
Expand Down
2 changes: 1 addition & 1 deletion swift/llm/rlhf.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,8 +126,8 @@ def llm_rlhf(args: RLHFArguments) -> Dict[str, Any]:
num_beams=args.num_beams,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id)
logger.info(f'generation_config: {generation_config}')
set_generation_config(model, generation_config)
logger.info(f'model.generation_config: {model.generation_config}')

# Preparing LoRA
model, _ = prepare_model(model, args)
Expand Down
2 changes: 1 addition & 1 deletion swift/llm/rome.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ def rome_infer(args: RomeArguments) -> None:
num_beams=args.num_beams,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id)
logger.info(f'generation_config: {generation_config}')
set_generation_config(model, generation_config)
logger.info(f'model.generation_config: {model.generation_config}')
if args.overwrite_generation_config:
generation_config.save_pretrained(args.ckpt_dir)

Expand Down
4 changes: 2 additions & 2 deletions swift/llm/sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,9 @@ def llm_sft(args: SftArguments) -> Dict[str, Any]:
num_beams=args.num_beams,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id)
logger.info(f'generation_config: {generation_config}')
set_generation_config(model, generation_config)
training_args.generation_config = generation_config
logger.info(f'model.generation_config: {model.generation_config}')
training_args.generation_config = model.generation_config

if use_torchacc():
import torchacc as ta
Expand Down
32 changes: 15 additions & 17 deletions swift/llm/utils/argument.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ def __post_init__(self) -> None:
self.device_map_config = json.load(f)
else: # json str
self.device_map_config = json.loads(self.device_map_config)
_, local_rank, _, local_world_size = get_dist_setting()
if local_world_size > 1 and isinstance(self.device_map_config, dict) and local_rank > 0:
for k, v in self.device_map_config.items():
if isinstance(v, int):
self.device_map_config[k] += local_rank

@classmethod
def _check_path(cls,
Expand Down Expand Up @@ -130,13 +135,6 @@ def check_flash_attn(self: Union['SftArguments', 'InferArguments']) -> None:
def handle_generation_config(self: Union['SftArguments', 'InferArguments']) -> None:
if self.temperature == 0:
self.do_sample = False
if self.do_sample is False:
# fix warning
self.temperature = 1.
self.top_p = 1.
self.top_k = 50
logger.info('Due to do_sample=False, the following settings are applied: args.temperature: '
f'{self.temperature}, args.top_p: {self.top_p}, args.top_k: {self.top_k}.')

def select_dtype(self: Union['SftArguments', 'InferArguments']) -> Tuple[Optional[Dtype], bool, bool]:
if not is_torch_cuda_available() and not is_torch_npu_available():
Expand Down Expand Up @@ -825,11 +823,11 @@ class SftArguments(ArgumentsBase):

# generation config
max_new_tokens: int = 2048
do_sample: bool = True
temperature: float = 0.3
top_k: int = 20
top_p: float = 0.7
repetition_penalty: float = 1.
do_sample: Optional[bool] = None
temperature: Optional[float] = None
top_k: Optional[int] = None
top_p: Optional[float] = None
repetition_penalty: Optional[float] = None
num_beams: int = 1

# fsdp option
Expand Down Expand Up @@ -1336,11 +1334,11 @@ class InferArguments(ArgumentsBase):
bnb_4bit_quant_storage: Optional[str] = None

max_new_tokens: int = 2048
do_sample: bool = True
temperature: float = 0.3
top_k: int = 20
top_p: float = 0.7
repetition_penalty: float = 1.
do_sample: Optional[bool] = None
temperature: Optional[float] = None
top_k: Optional[int] = None
top_p: Optional[float] = None
repetition_penalty: Optional[float] = None
num_beams: int = 1
stop_words: List[str] = field(default_factory=list)

Expand Down
7 changes: 5 additions & 2 deletions swift/llm/utils/client_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@


def _get_request_kwargs(api_key: Optional[str] = None) -> Dict[str, Any]:
timeout = float(os.getenv('TIMEOUT', '60'))
request_kwargs = {'timeout': timeout}
if api_key is None:
return {}
return {'headers': {'Authorization': f'Bearer {api_key}'}}
return request_kwargs
request_kwargs['headers'] = {'Authorization': f'Bearer {api_key}'}
return request_kwargs


def get_model_list_client(host: str = '127.0.0.1', port: str = '8000', api_key: str = 'EMPTY', **kwargs) -> ModelList:
Expand Down
31 changes: 19 additions & 12 deletions swift/llm/utils/lmdeploy_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ def __init__(
stop_words = []
if max_new_tokens is None:
max_new_tokens = 64
self._temperature = temperature
super().__init__(
max_new_tokens=max_new_tokens,
temperature=temperature,
Expand All @@ -149,6 +150,17 @@ def __init__(
skip_special_tokens=skip_special_tokens,
**kwargs)

def __setattr__(self, key: str, value: str) -> None:
if key == 'do_sample':
assert value in {True, False}
super().__setattr__('temperature', self._temperature if value else 0)
elif key == 'max_length':
raise ValueError('`max_length` is not supported, please use `max_new_tokens` for setting.')
else:
if key == 'temperature':
self._temperature = value
super().__setattr__(key, value)


def _add_stop_word(stop_words: List[int], token: Union[List[int], int, str, None], tokenizer=None) -> None:
if token is None:
Expand Down Expand Up @@ -443,21 +455,16 @@ def prepare_lmdeploy_engine_template(args: InferArguments) -> Tuple[Union[AsyncE
model_id_or_path=model_id_or_path)
tokenizer = lmdeploy_engine.hf_tokenizer

if not args.do_sample:
args.temperature = 0

stop_words = []
for stop_word in args.stop_words:
_add_stop_word(stop_words, stop_word, tokenizer=tokenizer)
generation_config = LmdeployGenerationConfig(
max_new_tokens=args.max_new_tokens,
temperature=args.temperature,
top_k=args.top_k,
top_p=args.top_p,
stop_words=stop_words,
repetition_penalty=args.repetition_penalty)
logger.info(f'generation_config: {generation_config}')
lmdeploy_engine.generation_config = generation_config
setattr(lmdeploy_engine.generation_config, 'max_new_tokens', args.max_new_tokens)
for k in ['temperature', 'do_sample', 'top_k', 'top_p', 'repetition_penalty']:
val = getattr(args, k, None)
if val is not None:
setattr(lmdeploy_engine.generation_config, k, val)
logger.info(f'lmdeploy_engine.generation_config: {lmdeploy_engine.generation_config}')

template: Template = get_template(
args.template_type,
tokenizer,
Expand Down
16 changes: 10 additions & 6 deletions swift/llm/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,7 +599,10 @@ def _prepare_inputs(model: PreTrainedModel,
if 'token_type_ids' in inputs:
inputs['token_type_ids'] = torch.tensor(inputs['token_type_ids'])[None]
model.eval()

if not generation_config.do_sample:
generation_config.temperature = 1.
generation_config.top_p = 1.
generation_config.top_k = 50
if tokenizer.eos_token_id is not None:
generation_config.eos_token_id = tokenizer.eos_token_id
if tokenizer.pad_token_id is not None:
Expand Down Expand Up @@ -918,11 +921,12 @@ def set_generation_config(model: Module, generation_config: GenerationConfig) ->
old_generation_config = getattr(model, 'generation_config', None)
old_generation_priority_config = ['no_repeat_ngram_size']
if old_generation_config is not None:
for k, v in old_generation_config.__dict__.items():
if k in old_generation_priority_config:
setattr(generation_config, k, v)
if k not in generation_config.__dict__:
setattr(generation_config, k, v)
for k, old_v in old_generation_config.__dict__.items():
if k.startswith('_'):
continue
v = getattr(generation_config, k, None)
if k in old_generation_priority_config or old_v is not None and v is None:
setattr(generation_config, k, old_v)
model.generation_config = generation_config


Expand Down
3 changes: 2 additions & 1 deletion swift/llm/utils/vision_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ def load_file(path: Union[str, _T]) -> Union[BytesIO, _T]:
if isinstance(path, str):
path = path.strip()
if path.startswith('http'):
content = requests.get(path).content
timeout = float(os.getenv('TIMEOUT', '60'))
content = requests.get(path, timeout=timeout).content
res = BytesIO(content)
elif os.path.exists(path):
with open(path, 'rb') as f:
Expand Down
Loading

0 comments on commit 018bd8d

Please sign in to comment.