Pulse · modelscope/ms-swift · GitHub

May 1, 2025 – May 8, 2025

Overview

27 Active pull requests

57 Active issues

20 Pull requests merged by 4 people

[grpo] fix labels pop and peftmodel parameter check
#4136 merged May 8, 2025
update qwen3 more models
#4123 merged May 8, 2025
fix sequence_parallel
#4122 merged May 7, 2025
fix omni aligner
#4117 merged May 7, 2025
Fix ulysses eval
#4114 merged May 7, 2025
fix packing
#4113 merged May 7, 2025
fix enable_cache
#4109 merged May 7, 2025
fix requirements
#4108 merged May 7, 2025
[megatron] Update long text shell
#4106 merged May 7, 2025
support max_epochs
#4102 merged May 7, 2025
Update liger code
#4095 merged May 6, 2025
fix enable_cache
#4091 merged May 6, 2025
Support ulysses for llm/mllm,dpo/sft
#4085 merged May 5, 2025
update docs
#4078 merged May 4, 2025
feat: support megatron wandb
#4074 merged May 4, 2025
feat: add run name support
#4072 merged May 3, 2025
fix padding_side left
#4069 merged May 3, 2025
support MiMo-7B
#4067 merged May 2, 2025
fix packing eval streaming
#4066 merged May 2, 2025
Support empty think loss scale
#4065 merged May 2, 2025

7 Pull requests opened by 3 people

fix enable_cache
#4075 opened May 4, 2025
refactor grpo internal mode
#4097 opened May 6, 2025
Refactor SP
#4121 opened May 7, 2025
Fix grpo multi modal doc
#4124 opened May 8, 2025
[megatron] support max_epochs
#4125 opened May 8, 2025
fix model_type mismatch
#4127 opened May 8, 2025
support more vision dataset
#4132 opened May 8, 2025

17 Issues closed by 10 people

Support for Qwen2-Audio and Qwen2.5-Omni
#4088 closed May 8, 2025
qwen2.5-omni-7b merge-lora results differ
#3756 closed May 8, 2025
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
#4120 closed May 8, 2025
Qwen2.5-7B-Base 超长文本训练部分step之后报错
#4105 closed May 7, 2025
[megatron] ERROR:megatron.core.dist_checkpointing.strategies.filesystem_async:Local process 0 encountered an error: _write_item() missing 1 required positional argument: 'storage_key'
#4111 closed May 7, 2025
关于deepspeed多卡训练时.cache中出现和卡数成正比的数据拷贝，导致存储空间占用过大的问题
#3965 closed May 6, 2025
Qwen3-8B-Base SFT 全参微调保存第一个模型后hang住
#4053 closed May 6, 2025
Qwen3数据集设置不优雅
#4087 closed May 6, 2025
Too many dataloader workers
#4061 closed May 6, 2025
qwen3 seq_cls
#4073 closed May 6, 2025
requirements中包的版本存在问题
#4080 closed May 5, 2025
Support wandb logging in Swift Megatron SFT
#4071 closed May 4, 2025
Add run_name argument support for wandb integration
#4046 closed May 4, 2025
qwen2.5-vl推理时卡住
#3799 closed May 3, 2025
是否计划支持XiaomiMiMo/MiMo-7B模型的微调？
#4064 closed May 2, 2025
packing似乎和lazy_encode参数是冲突的？
#4054 closed May 2, 2025
KTO使用自定义数据集报错
#4062 closed May 2, 2025

40 Issues opened by 34 people

自定义模型并注册,在数据map时卡住（版本3.3.1）
#4138 opened May 8, 2025
pip install 'ms-swift[all]' -U的时候会进行很多个版本的下载
#4137 opened May 8, 2025
Request Failed with 422 Error: Input Should Be a Valid String for Image Paths
#4135 opened May 8, 2025
Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 opened May 8, 2025
swift megatron sys._base_executable problem
#4133 opened May 8, 2025
在训练好的lora基础上用别的数据二次训练
#4131 opened May 8, 2025
swift infer在tp=2的情况下，不支持deepseek-r1-distill-qwen系列和qwq32B模型的批推理
#4130 opened May 8, 2025
swift infer的批处理非常好用，但能否支持近实时写入result_path，而不是最后写入
#4129 opened May 8, 2025
Qwen2-audio-instruct用lora微调后inference，出现tensor维度不对应的问题
#4128 opened May 8, 2025
支持Qwen3 MoE的Megatron LoRA训练
#4126 opened May 8, 2025
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
#4119 opened May 7, 2025
GRPO下的多轮多模态对话数据集构建
#4118 opened May 7, 2025
推理中出现从未遇见的bug
#4116 opened May 7, 2025
有无懂哥说说internvl3_8B微调完后怎么做awq量化呀
#4115 opened May 7, 2025
beta参数在GRPO中失效
#4112 opened May 7, 2025
qwen omni注册的问题
#4110 opened May 7, 2025
对于一个已经完成sft之后的任务，如果我想加入新的知识但不想掉点，我应该选择ms-swift实现的强化微调和GRPO哪个来完成呢？
#4107 opened May 7, 2025
dpo模型RuntimeError: CUDA driver error: invalid argument，
#4104 opened May 7, 2025
训练的时候总提示： RuntimeError: CUDA driver error: invalid argument
#4103 opened May 7, 2025
LLama-omni进行audio微调索引报错
#4101 opened May 7, 2025
ulysses raise NotImplementedError
#4100 opened May 7, 2025
框架支持传rope theta的参数吗？
#4099 opened May 6, 2025
序列分类模型在推理的时候会shuffle数据集
#4098 opened May 6, 2025
internvl3_8B多模态模型的微调如何设置不同模块的冷冻与lora阶数呢？
#4096 opened May 6, 2025
有什么参数可以调节dataset的sampling的比例
#4094 opened May 6, 2025
sequence classification inference
#4093 opened May 6, 2025
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 opened May 6, 2025
可否在eval的过程中保存结果呢
#4090 opened May 6, 2025
为啥现做RLHF 不支持sequence_parallel
#4089 opened May 6, 2025
在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group.
#4086 opened May 5, 2025
lora微调gte embedding, merge后推理结果跟微调的结果相差很大
#4084 opened May 5, 2025
Streaming + Packing + resume_from_checkpoint时出现报错
#4083 opened May 5, 2025
function call 微调报错 TypeError: string indices must be integers, not 'str'
#4082 opened May 5, 2025
训练正常 eval时报assert error
#4081 opened May 5, 2025
Pre-offline tokenize for ultra large multimodal datasets
#4079 opened May 4, 2025
making llm_max_batch_size and mllm_max_batch_size configurable
#4077 opened May 4, 2025
InternVL3-9B LoRA微调数据集预处理速度缓慢问题（大约7h）
#4076 opened May 4, 2025
Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower
#4070 opened May 3, 2025
how to run using vllm
#4068 opened May 3, 2025
inference error with vllm 0.8.5
#4063 opened May 1, 2025

29 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

NPU训练qwen2.5-vl报错
#3408 commented on May 2, 2025 • 0 new comments
GRPO Training Speed Testing
#3302 commented on May 2, 2025 • 0 new comments
在新版本（3.4）中，如果nproc_per_node小于CUDA_VISIBLE_DEVICES的数量时无法运行，老版本（3.2）可以
#4019 commented on May 2, 2025 • 0 new comments
raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}") [rank1]: KeyError: 'Column length not in the dataset. Current columns in the dataset: []'
#4058 commented on May 3, 2025 • 0 new comments
transformer_engine 安装失败
#4051 commented on May 3, 2025 • 0 new comments
grpo训练32b模型OOM
#3871 commented on May 3, 2025 • 0 new comments
不支持bf16报错
#4036 commented on May 3, 2025 • 0 new comments
微调了qwen2-audio-7b-instruct
#2637 commented on May 5, 2025 • 0 new comments
QwenVL2 72B 序列并行报错维度不匹配
#2972 commented on May 5, 2025 • 0 new comments
[HELP]推理奖励模型报错，感谢大家，求教qwen基座rm后的模型如何vllm推理
#4045 commented on May 6, 2025 • 0 new comments
在inference的时候指定--max_length 4096但是似乎没有起到任何作用
#3967 commented on May 6, 2025 • 0 new comments
While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!””
#3930 commented on May 6, 2025 • 0 new comments
用grpo训练qwen2.5-7b-instruct出现!!!!
#4060 commented on May 6, 2025 • 0 new comments
训练中途突然报错 NCCL watchdog thread terminated with exception
#1817 commented on May 6, 2025 • 0 new comments
在GRPO训练中Weight_decay似乎没奏效？
#3931 commented on May 6, 2025 • 0 new comments
Customized Image Data Augmentation
#2345 commented on May 7, 2025 • 0 new comments
cannot import name 'LoRA' from 'swift'
#3665 commented on May 7, 2025 • 0 new comments
lora微调后再awq量化，报错，详细如下：
#2318 commented on May 7, 2025 • 0 new comments
关于qLoRA训练
#4007 commented on May 7, 2025 • 0 new comments
Qwen2.5-vl 微调grounding任务，怎么使用自己本地数据集训练
#3204 commented on May 8, 2025 • 0 new comments
请求支持健康检查
#3474 commented on May 8, 2025 • 0 new comments
微调DS_32B后merge_lora，将合并后的模型推理不生效
#3974 commented on May 8, 2025 • 0 new comments
原始gte 7B 模型大小大概29G，使用github，训练脚本使用example中对应的训练参数，改为全参训练，参数变成 14G。GTE模型全参训练完加载报错
#4005 commented on May 8, 2025 • 0 new comments
GRPO训练报错：Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension
#3864 commented on May 8, 2025 • 0 new comments
支持GME微调么
#3019 commented on May 8, 2025 • 0 new comments
qwen2.5-vl-72b, vllm_server_host方式运行，CUDA out of memory
#4023 commented on May 8, 2025 • 0 new comments
SimPO and ORPO support for VLM (Qwen2.5VL)
#3718 commented on May 8, 2025 • 0 new comments
多卡多进程使用orpo卡死，触发watchdog caught collective operation timeout.
#3564 commented on May 8, 2025 • 0 new comments
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on May 8, 2025 • 0 new comments