知乎博客地址: https://zhuanlan.zhihu.com/p/673359684
Repo:https://github.com/microsoft/DeepSpeedExamples
Setting(bf16,不开gradient-checkpointing) | max_allocated_memory/GB(若无说明则bs=32) | time per epoch/s(bs=32 8卡 共400条数据) |
---|---|---|
ZERO-0 (DDP) | bs=32 OOM bs=16 18.36 | 11.57 |
ZERO-1 | 21.46 | 9.68 |
ZERO-1 (offload optimizer) | 20.40 | 13.45 |
ZERO-2 | 22.26 | 8.51 |
ZERO-2 (offload optimizer) | 20.80 | 13.34 |
ZERO-3 | 22.37 | 7.94 |
ZERO-3 (offload optimizer) | 20.39 | 12.67 |
ZERO-3 (offload optimizer + params) | 20.39 | 12.08 |