-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[REQUEST] Extend New feature or request
offload_states
to support models with cpu-based optimizer
enhancement
#6596
opened Oct 1, 2024 by
kfertakis
[REQUEST] Dynamic model offload support ZeRO-3 inference models
enhancement
New feature or request
#6595
opened Oct 1, 2024 by
kfertakis
[BUG] Deepspeed installation issue
bug
Something isn't working
training
#6593
opened Oct 1, 2024 by
usarth
[BUG] MOE: Loading experts parameters error when using expert parallel.
bug
Something isn't working
training
#6589
opened Sep 29, 2024 by
kakaxi-liu
[BUG] DeepSpeed Ulysses zero3 compatibility
bug
Something isn't working
training
#6582
opened Sep 27, 2024 by
Xirid
[BUG] AttributeError: 'NoneType' object has no attribute 'set_moe'
bug
Something isn't working
inference
#6572
opened Sep 25, 2024 by
zhanwenchen
[BUG] ValueError: Tensors must be contiguous when using deepspeed.initialize
bug
Something isn't working
training
#6571
opened Sep 25, 2024 by
shadow150519
[BUG] The learning rate scheduler is being ignored in the first optimization step.
bug
Something isn't working
training
#6569
opened Sep 25, 2024 by
eaplatanios
[BUG] AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
bug
Something isn't working
training
#6568
opened Sep 24, 2024 by
umarbutler
Something get wrong when run “aio_” and "gds_" file(DeepNVMe)
bug
Something isn't working
training
#6567
opened Sep 24, 2024 by
niebowen666
[TRACKER] Customer support related PR tracker for Intel devices
enhancement
New feature or request
#6556
opened Sep 20, 2024 by
delock
20 of 23 tasks
[BUG] ACL stream synchronize failed, error code 507015
bug
Something isn't working
compression
#6555
opened Sep 19, 2024 by
janelu9
[BUG] CUDA error: no kernel image is available for execution on the device
bug
Something isn't working
training
#6549
opened Sep 17, 2024 by
getao
[REQUEST] A minimal example to load universal checkpoint
enhancement
New feature or request
#6548
opened Sep 17, 2024 by
hongshanli23
[BUG] Expert gradient scaling problem with ZeRO optimizer
bug
Something isn't working
training
#6545
opened Sep 17, 2024 by
wyooyw
RecursionError: maximum recursion depth exceeded while calling a Python object
bug
Something isn't working
training
#6534
opened Sep 13, 2024 by
Swordfish1990
[REQUEST] dynamic batch size with gradient accumulate
enhancement
New feature or request
#6533
opened Sep 13, 2024 by
Xiang-cd
[REQUEST] parallelize zero_to_fp32.py to use multiple cpu-cores and threads
enhancement
New feature or request
#6526
opened Sep 11, 2024 by
stas00
[BUG] Distributed Training randomly stuck in trainings loop
bug
Something isn't working
training
#6524
opened Sep 11, 2024 by
raeudigerRaeffi
[BUG] error :past_key, past_value = layer_past,how to solve this ?
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#6522
opened Sep 11, 2024 by
lovychen
[BUG] RuntimeError: Error building extension 'inference_core_ops'
bug
Something isn't working
inference
#6519
opened Sep 10, 2024 by
Chetan3200
[REQUEST] ZeRO3 doc - support for wrapping model sub-components seperately for training
enhancement
New feature or request
#6505
opened Sep 8, 2024 by
orrzohar
[BUG] Universal checkpointing doesn't work when changing model parallel size (pp and dp change are ok)
bug
Something isn't working
compression
#6503
opened Sep 8, 2024 by
exnx
Previous Next
ProTip!
Adding no:label will show everything without a label.