Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于qwen25vl_7b_instruct lora微调后的模型推理报错KeyError: 0 #6960

Closed
1 task done
RuoxuanYu opened this issue Feb 17, 2025 · 9 comments · Fixed by #6972
Closed
1 task done

基于qwen25vl_7b_instruct lora微调后的模型推理报错KeyError: 0 #6960

RuoxuanYu opened this issue Feb 17, 2025 · 9 comments · Fixed by #6972
Labels
solved This problem has been already solved

Comments

@RuoxuanYu
Copy link

RuoxuanYu commented Feb 17, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

llamafactory 0.9.2.dev0
datasets 3.2.0
transformers 4.49.0.dev0

Reproduction

通过shell脚本(见下)
#!/bin/bash

--设置环境变量
export DISABLE_VERSION_CHECK=1
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen25vl_lora_infer.yaml

运行的yaml文件如下
-- model
model_name_or_path: /model/Qwen25VL_7B_Instruct
adapter_name_or_path: /saves/qwen25_vl_7b_Instruct/lora/sft

-- method
stage: sft
do_predict: true
finetuning_type: lora

-- dataset
eval_dataset: cpv_mllm_dev
template: qwen2_vl
cutoff_len: 1024
max_samples: 10000000
overwrite_cache: true
preprocessing_num_workers: 16

-- output
output_dir: saves/newqwen25vl_cpvres/lora/predict_cpv
overwrite_output_dir: true

-- eval
per_device_eval_batch_size: 1000
predict_with_generate: true
ddp_timeout: 180000000

eval_dataset数据集采用Alpaca 格式,与训练数据格式一致
[
{
"instruction": "人类指令(必填)",
"input": "人类输入(选填)",
"output": "模型回答(必填)",
"images": [
"图像路径(必填)"
]
}
]
并且dataset_info内容也有按照格式加入
"数据集名称": {
"cpv_mllm_dev": "data.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"images": "images"
}
}

报错:[rank7]: Traceback (most recent call last):
[rank7]: File "/home/llm/0214modalllamafac/updateLLaMA-Factory-main/src/llamafactory/launcher.py", line 23, in
[rank7]: launch()
[rank7]: File "/home//llm/0214modalllamafac/updateLLaMA-Factory-main/src/llamafactory/launcher.py", line 19, in launch
[rank7]: run_exp()
[rank7]: File "/home/llm/0214modalllamafac/updateLLaMA-Factory-main/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank7]: _training_function(config={"args": args, "callbacks": callbacks})
[rank7]: File "/home/llm/0214modalllamafac/updateLLaMA-Factory-main/src/llamafactory/train/tuner.py", line 67, in _training_function
[rank7]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank7]: File "/home/llm/0214modalllamafac/updateLLaMA-Factory-main/src/llamafactory/train/sft/workflow.py", line 127, in run_sft
[rank7]: predict_results = trainer.predict(dataset_module["eval_dataset"], metric_key_prefix="predict", **gen_kwargs)
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 261, in predict
[rank7]: return super().predict(test_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/transformers/trainer.py", line 4183, in predict
[rank7]: output = eval_loop(
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/transformers/trainer.py", line 4289, in evaluation_loop
[rank7]: for step, inputs in enumerate(dataloader):
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in iter
[rank7]: current_batch = next(dataloader_iter)
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in next
[rank7]: data = self._next_data()
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank7]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
[rank7]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank7]: File "/home/.conda/envs/vlenv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in
[rank7]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank7]: KeyError: 0

Others

No response

@RuoxuanYu RuoxuanYu added bug Something isn't working pending This problem is yet to be addressed labels Feb 17, 2025
@leon-cas
Copy link

leon-cas commented Feb 17, 2025

llamafactory 0.9.2.dev0版本代码中会check transformers的版本要低于4.48.3,见代码:check_version("transformers>=4.41.2,<=4.48.3,!=4.46.0,!=4.46.1,!=4.46.2,!=4.46.3,!=4.47.0,!=4.47.1,!=4.48.0"); 但我看你使用的是transformers 4.49.0.dev0, 请问是手动修改了这里的代码吗? @RuoxuanYu

@Cassieyy
Copy link

llamafactory 0.9.2.dev0版本代码中会check transformers的版本要低于4.48.3,见代码:check_version("transformers>=4.41.2,<=4.48.3,!=4.46.0,!=4.46.1,!=4.46.2,!=4.46.3,!=4.47.0,!=4.47.1,!=4.48.0"); 但我看你使用的是transformers 4.49.0.dev0, 请问是手动修改了这里的代码吗? @RuoxuanYu

export DISABLE_VERSION_CHECK=1

设置这个环境变量

@leon-cas
Copy link

leon-cas commented Feb 17, 2025

llamafactory 0.9.2.dev0版本代码中会check transformers的版本要低于4.48.3,见代码:check_version("transformers>=4.41.2,<=4.48.3,!=4.46.0,!=4.46.1,!=4.46.2,!=4.46.3,!=4.47.0,!=4.47.1,!=4.48.0"); 但我看你使用的是transformers 4.49.0.dev0, 请问是手动修改了这里的代码吗? @RuoxuanYu

export DISABLE_VERSION_CHECK=1

设置这个环境变量

@Cassieyy thanks!

@hiyouga
Copy link
Owner

hiyouga commented Feb 17, 2025

应该是数据集格式不对导致预处理时候丢弃了样本

@hiyouga hiyouga closed this as completed Feb 17, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 17, 2025
@RuoxuanYu
Copy link
Author

RuoxuanYu commented Feb 17, 2025

!)按照Alpaca格式请问是哪里有问题呢@hiyouga

应该是数据集格式不对导致预处理时候丢弃了样本

@hiyouga
Copy link
Owner

hiyouga commented Feb 17, 2025

output不能为空,可以随便写点

@hiyouga hiyouga reopened this Feb 17, 2025
@hiyouga hiyouga added bug Something isn't working pending This problem is yet to be addressed and removed solved This problem has been already solved labels Feb 17, 2025
@hiyouga
Copy link
Owner

hiyouga commented Feb 17, 2025

好像是别的问题,我看一下

@RuoxuanYu
Copy link
Author

好像是别的问题,我看一下

好的好的

@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 17, 2025
@hiyouga
Copy link
Owner

hiyouga commented Feb 17, 2025

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants