We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRPO训练参数将beta参数设置为0.0,输出loss理论为0,但实际上不为0。 MAX_PIXELS=602112 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \ NPROC_PER_NODE=2 \ swift rlhf \ --rlhf_type grpo \ --model /data/szy/huggingface_cache/models/Qwen2.5-VL-7b\ --external_plugins /data/szy/llib/ms-swift-main/examples/train/grpo/plugin/plugin.py \ --reward_funcs external_math_format external_qa_reward \ --use_vllm true \ --train_type full \ --torch_dtype bfloat16 \ --dataset /data/szy/llib/benchmark/ssg-qa-medium/train_data_anatomy_sgg_qa_medium_2.json \ --max_completion_length 512 \ --num_train_epochs 5 \ --per_device_train_batch_size 8 \ --gradient_accumulation_steps 8 \ --per_device_eval_batch_size 4 \ --learning_rate 1e-6 \ --eval_steps 1000 \ --save_steps 10 \ --save_total_limit 2 \ --logging_steps 1 \ --output_dir output \ --warmup_ratio 0 \ --dataloader_num_workers 4 \ --dataset_num_proc 4 \ --num_generations 8 \ --temperature 1.0 \ --top_p 1.0 \ --top_k 50 \ --async_generate true \ --dynamic_sample true \ --epsilon_high 0.28 \ --system 'examples/train/grpo/prompt.txt' \ --deepspeed zero3 \ --log_completions true \ --num_iterations 1 \ --beta 0 \ --num_infer_workers 2 \ --report_to wandb \
MAX_PIXELS=602112 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \ NPROC_PER_NODE=2 \ swift rlhf \ --rlhf_type grpo \ --model /data/szy/huggingface_cache/models/Qwen2.5-VL-7b\ --external_plugins /data/szy/llib/ms-swift-main/examples/train/grpo/plugin/plugin.py \ --reward_funcs external_math_format external_qa_reward \ --use_vllm true \ --train_type full \ --torch_dtype bfloat16 \ --dataset /data/szy/llib/benchmark/ssg-qa-medium/train_data_anatomy_sgg_qa_medium_2.json \ --max_completion_length 512 \ --num_train_epochs 5 \ --per_device_train_batch_size 8 \ --gradient_accumulation_steps 8 \ --per_device_eval_batch_size 4 \ --learning_rate 1e-6 \ --eval_steps 1000 \ --save_steps 10 \ --save_total_limit 2 \ --logging_steps 1 \ --output_dir output \ --warmup_ratio 0 \ --dataloader_num_workers 4 \ --dataset_num_proc 4 \ --num_generations 8 \ --temperature 1.0 \ --top_p 1.0 \ --top_k 50 \ --async_generate true \ --dynamic_sample true \ --epsilon_high 0.28 \ --system 'examples/train/grpo/prompt.txt' \ --deepspeed zero3 \ --log_completions true \ --num_iterations 1 \ --beta 0 \ --num_infer_workers 2 \ --report_to wandb \
The text was updated successfully, but these errors were encountered:
What is the version of ms-swift?
Sorry, something went wrong.
No branches or pull requests
GRPO训练参数将beta参数设置为0.0,输出loss理论为0,但实际上不为0。
MAX_PIXELS=602112 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \ NPROC_PER_NODE=2 \ swift rlhf \ --rlhf_type grpo \ --model /data/szy/huggingface_cache/models/Qwen2.5-VL-7b\ --external_plugins /data/szy/llib/ms-swift-main/examples/train/grpo/plugin/plugin.py \ --reward_funcs external_math_format external_qa_reward \ --use_vllm true \ --train_type full \ --torch_dtype bfloat16 \ --dataset /data/szy/llib/benchmark/ssg-qa-medium/train_data_anatomy_sgg_qa_medium_2.json \ --max_completion_length 512 \ --num_train_epochs 5 \ --per_device_train_batch_size 8 \ --gradient_accumulation_steps 8 \ --per_device_eval_batch_size 4 \ --learning_rate 1e-6 \ --eval_steps 1000 \ --save_steps 10 \ --save_total_limit 2 \ --logging_steps 1 \ --output_dir output \ --warmup_ratio 0 \ --dataloader_num_workers 4 \ --dataset_num_proc 4 \ --num_generations 8 \ --temperature 1.0 \ --top_p 1.0 \ --top_k 50 \ --async_generate true \ --dynamic_sample true \ --epsilon_high 0.28 \ --system 'examples/train/grpo/prompt.txt' \ --deepspeed zero3 \ --log_completions true \ --num_iterations 1 \ --beta 0 \ --num_infer_workers 2 \ --report_to wandb \
The text was updated successfully, but these errors were encountered: