-
Notifications
You must be signed in to change notification settings - Fork 630
Insights: modelscope/ms-swift
Overview
Could not load contribution data
Please try again later
20 Pull requests merged by 4 people
-
[grpo] fix labels pop and peftmodel parameter check
#4136 merged
May 8, 2025 -
update qwen3 more models
#4123 merged
May 8, 2025 -
fix sequence_parallel
#4122 merged
May 7, 2025 -
fix omni aligner
#4117 merged
May 7, 2025 -
Fix ulysses eval
#4114 merged
May 7, 2025 -
fix packing
#4113 merged
May 7, 2025 -
fix enable_cache
#4109 merged
May 7, 2025 -
fix requirements
#4108 merged
May 7, 2025 -
[megatron] Update long text shell
#4106 merged
May 7, 2025 -
support max_epochs
#4102 merged
May 7, 2025 -
Update liger code
#4095 merged
May 6, 2025 -
fix enable_cache
#4091 merged
May 6, 2025 -
Support ulysses for llm/mllm,dpo/sft
#4085 merged
May 5, 2025 -
update docs
#4078 merged
May 4, 2025 -
feat: support megatron wandb
#4074 merged
May 4, 2025 -
feat: add run name support
#4072 merged
May 3, 2025 -
fix padding_side left
#4069 merged
May 3, 2025 -
support MiMo-7B
#4067 merged
May 2, 2025 -
fix packing eval streaming
#4066 merged
May 2, 2025 -
Support empty think loss scale
#4065 merged
May 2, 2025
7 Pull requests opened by 3 people
-
fix enable_cache
#4075 opened
May 4, 2025 -
refactor grpo internal mode
#4097 opened
May 6, 2025 -
Refactor SP
#4121 opened
May 7, 2025 -
Fix grpo multi modal doc
#4124 opened
May 8, 2025 -
[megatron] support max_epochs
#4125 opened
May 8, 2025 -
fix model_type mismatch
#4127 opened
May 8, 2025 -
support more vision dataset
#4132 opened
May 8, 2025
17 Issues closed by 10 people
-
Support for Qwen2-Audio and Qwen2.5-Omni
#4088 closed
May 8, 2025 -
qwen2.5-omni-7b merge-lora results differ
#3756 closed
May 8, 2025 -
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
#4120 closed
May 8, 2025 -
Qwen2.5-7B-Base 超长文本训练部分step之后报错
#4105 closed
May 7, 2025 -
关于deepspeed多卡训练时.cache中出现和卡数成正比的数据拷贝,导致存储空间占用过大的问题
#3965 closed
May 6, 2025 -
Qwen3-8B-Base SFT 全参微调保存第一个模型后hang住
#4053 closed
May 6, 2025 -
Qwen3数据集设置不优雅
#4087 closed
May 6, 2025 -
Too many dataloader workers
#4061 closed
May 6, 2025 -
qwen3 seq_cls
#4073 closed
May 6, 2025 -
requirements中包的版本存在问题
#4080 closed
May 5, 2025 -
Support wandb logging in Swift Megatron SFT
#4071 closed
May 4, 2025 -
Add run_name argument support for wandb integration
#4046 closed
May 4, 2025 -
qwen2.5-vl推理时卡住
#3799 closed
May 3, 2025 -
是否计划支持XiaomiMiMo/MiMo-7B模型的微调?
#4064 closed
May 2, 2025 -
packing似乎和lazy_encode参数是冲突的?
#4054 closed
May 2, 2025 -
KTO使用自定义数据集报错
#4062 closed
May 2, 2025
40 Issues opened by 34 people
-
自定义模型并注册,在数据map时卡住(版本3.3.1)
#4138 opened
May 8, 2025 -
pip install 'ms-swift[all]' -U的时候会进行很多个版本的下载
#4137 opened
May 8, 2025 -
Request Failed with 422 Error: Input Should Be a Valid String for Image Paths
#4135 opened
May 8, 2025 -
Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 opened
May 8, 2025 -
swift megatron sys._base_executable problem
#4133 opened
May 8, 2025 -
在训练好的lora基础上用别的数据二次训练
#4131 opened
May 8, 2025 -
swift infer在tp=2的情况下,不支持deepseek-r1-distill-qwen系列和qwq32B模型的批推理
#4130 opened
May 8, 2025 -
swift infer的批处理非常好用,但能否支持近实时写入result_path,而不是最后写入
#4129 opened
May 8, 2025 -
Qwen2-audio-instruct用lora微调后inference,出现tensor维度不对应的问题
#4128 opened
May 8, 2025 -
支持Qwen3 MoE的Megatron LoRA训练
#4126 opened
May 8, 2025 -
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
#4119 opened
May 7, 2025 -
GRPO下的多轮多模态对话数据集构建
#4118 opened
May 7, 2025 -
推理中出现从未遇见的bug
#4116 opened
May 7, 2025 -
有无懂哥说说internvl3_8B微调完后怎么做awq量化呀
#4115 opened
May 7, 2025 -
beta参数在GRPO中失效
#4112 opened
May 7, 2025 -
qwen omni注册的问题
#4110 opened
May 7, 2025 -
对于一个已经完成sft之后的任务,如果我想加入新的知识但不想掉点,我应该选择ms-swift实现的强化微调和GRPO哪个来完成呢?
#4107 opened
May 7, 2025 -
dpo模型RuntimeError: CUDA driver error: invalid argument,
#4104 opened
May 7, 2025 -
训练的时候总提示: RuntimeError: CUDA driver error: invalid argument
#4103 opened
May 7, 2025 -
LLama-omni进行audio微调索引报错
#4101 opened
May 7, 2025 -
ulysses raise NotImplementedError
#4100 opened
May 7, 2025 -
框架支持传rope theta的参数吗?
#4099 opened
May 6, 2025 -
序列分类模型在推理的时候会shuffle数据集
#4098 opened
May 6, 2025 -
internvl3_8B多模态模型的微调如何设置不同模块的冷冻与lora阶数呢?
#4096 opened
May 6, 2025 -
有什么参数可以调节dataset的sampling的比例
#4094 opened
May 6, 2025 -
sequence classification inference
#4093 opened
May 6, 2025 -
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 opened
May 6, 2025 -
可否在eval的过程中保存结果呢
#4090 opened
May 6, 2025 -
为啥现做RLHF 不支持sequence_parallel
#4089 opened
May 6, 2025 -
lora微调gte embedding, merge后推理结果跟微调的结果相差很大
#4084 opened
May 5, 2025 -
Streaming + Packing + resume_from_checkpoint时出现报错
#4083 opened
May 5, 2025 -
function call 微调报错 TypeError: string indices must be integers, not 'str'
#4082 opened
May 5, 2025 -
训练正常 eval时报assert error
#4081 opened
May 5, 2025 -
Pre-offline tokenize for ultra large multimodal datasets
#4079 opened
May 4, 2025 -
making llm_max_batch_size and mllm_max_batch_size configurable
#4077 opened
May 4, 2025 -
InternVL3-9B LoRA微调数据集预处理速度缓慢问题(大约7h)
#4076 opened
May 4, 2025 -
Fine-tuning Qwen2.5-Omni-7B with additional new layers on the audio tower
#4070 opened
May 3, 2025 -
how to run using vllm
#4068 opened
May 3, 2025 -
inference error with vllm 0.8.5
#4063 opened
May 1, 2025
29 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
NPU训练qwen2.5-vl报错
#3408 commented on
May 2, 2025 • 0 new comments -
GRPO Training Speed Testing
#3302 commented on
May 2, 2025 • 0 new comments -
在新版本(3.4)中,如果nproc_per_node小于CUDA_VISIBLE_DEVICES的数量时无法运行,老版本(3.2)可以
#4019 commented on
May 2, 2025 • 0 new comments -
raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}") [rank1]: KeyError: 'Column length not in the dataset. Current columns in the dataset: []'
#4058 commented on
May 3, 2025 • 0 new comments -
transformer_engine 安装失败
#4051 commented on
May 3, 2025 • 0 new comments -
grpo训练32b模型OOM
#3871 commented on
May 3, 2025 • 0 new comments -
不支持bf16报错
#4036 commented on
May 3, 2025 • 0 new comments -
微调了qwen2-audio-7b-instruct
#2637 commented on
May 5, 2025 • 0 new comments -
QwenVL2 72B 序列并行报错维度不匹配
#2972 commented on
May 5, 2025 • 0 new comments -
[HELP]推理奖励模型报错,感谢大家,求教qwen基座rm后的模型如何vllm推理
#4045 commented on
May 6, 2025 • 0 new comments -
在inference的时候指定--max_length 4096但是似乎没有起到任何作用
#3967 commented on
May 6, 2025 • 0 new comments -
While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!””
#3930 commented on
May 6, 2025 • 0 new comments -
用grpo训练qwen2.5-7b-instruct出现!!!!
#4060 commented on
May 6, 2025 • 0 new comments -
训练中途突然报错 NCCL watchdog thread terminated with exception
#1817 commented on
May 6, 2025 • 0 new comments -
在GRPO训练中Weight_decay似乎没奏效?
#3931 commented on
May 6, 2025 • 0 new comments -
Customized Image Data Augmentation
#2345 commented on
May 7, 2025 • 0 new comments -
cannot import name 'LoRA' from 'swift'
#3665 commented on
May 7, 2025 • 0 new comments -
lora微调后再awq量化,报错, 详细如下:
#2318 commented on
May 7, 2025 • 0 new comments -
关于qLoRA训练
#4007 commented on
May 7, 2025 • 0 new comments -
Qwen2.5-vl 微调grounding任务,怎么使用自己本地数据集训练
#3204 commented on
May 8, 2025 • 0 new comments -
请求支持健康检查
#3474 commented on
May 8, 2025 • 0 new comments -
微调DS_32B后merge_lora,将合并后的模型推理不生效
#3974 commented on
May 8, 2025 • 0 new comments -
原始gte 7B 模型大小大概29G, 使用github,训练脚本使用example中对应的训练参数,改为全参训练,参数变成 14G。GTE模型全参训练完加载报错
#4005 commented on
May 8, 2025 • 0 new comments -
GRPO训练报错:Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension
#3864 commented on
May 8, 2025 • 0 new comments -
支持GME微调么
#3019 commented on
May 8, 2025 • 0 new comments -
qwen2.5-vl-72b, vllm_server_host方式运行,CUDA out of memory
#4023 commented on
May 8, 2025 • 0 new comments -
SimPO and ORPO support for VLM (Qwen2.5VL)
#3718 commented on
May 8, 2025 • 0 new comments -
多卡多进程使用orpo卡死,触发watchdog caught collective operation timeout.
#3564 commented on
May 8, 2025 • 0 new comments -
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on
May 8, 2025 • 0 new comments