-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练到一半变得很慢怎么解决,是因为数据问题吗 #152
Comments
只使用我们的代码不添加任何额外结构也会出现这种问题吗? |
稍微加了一点点东西,相当于加了线性层,但是一开始训练的很正常,训练到一半出现这种情况 |
把group_by_modality_length设置为false试一下,如果还是出现这种情况,可能需要显存更大的显卡 |
"--dataloader_num_workers", "8",这个参数会影响训练的快慢吗,我弄小一点会不会训练就不会变慢 |
可以尝试一下 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
之前训练都是15秒左右一个item,突然变得很慢,我不知道什么原因,还非常不稳定,GPU温度也不算高,但是利用率非常低,看起来也没有频繁的发生数据交换,因为我一开始还算快,我感觉频繁跟内存交换数据的话会一直很慢,之前训练llava_dataset_665k里面的coco数据集约为llava_dataset_665k的一半,没有遇到这个问题,但是现在训练llava_dataset_665k就遇到这个问题了
![question2](https://private-user-images.githubusercontent.com/164643383/393821330-0aa11844-bd4c-4508-bad5-b9e6aa20251f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxODIwNzcsIm5iZiI6MTczOTE4MTc3NywicGF0aCI6Ii8xNjQ2NDMzODMvMzkzODIxMzMwLTBhYTExODQ0LWJkNGMtNDUwOC1iYWQ1LWI5ZTZhYTIwMjUxZi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMFQxMDAyNTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hNGM0ZWI2ZTgyYTRkOGYwNjAxOGU0MGE3NTE0NjhkY2U4NTJlMjVlOGZkNzYxNTA3ZWZiZTk4Y2Q2ZjU0ZTA1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.WfFA-p-xXZsaP5rzXEauvvVdw3LutR4e_Yy2aH_cacA)
The text was updated successfully, but these errors were encountered: