Release SuperBench v0.5.0

abuccts released this 29 Apr 02:56

· 168 commits to main since this release

7f607e4

SuperBench 0.5.0 Release Notes

Micro-benchmark Improvements

Support NIC only NCCL bandwidth benchmark on single node in NCCL/RCCL bandwidth test.
Support bi-directional bandwidth benchmark in GPU copy bandwidth test.
Support data checking in GPU copy bandwidth test.
Update rccl-tests submodule to fix divide by zero error.
Add GPU-Burn micro-benchmark.

Model-benchmark Improvements

Sync results on root rank for e2e model benchmarks in distributed mode.
Support customized env in local and torch.distributed mode.
Add support for pytorch>=1.9.0.
Keep BatchNorm as fp32 for pytorch cnn models cast to fp16.
Remove FP16 samples type converting time.
Support FAMBench.

Inference Benchmark Improvements

Revise the default setting for inference benchmark.
Add percentile metrics for inference benchmarks.
Support T4 and A10 in GEMM benchmark.
Add configuration with inference benchmark.

Other Improvements

Add command to support listing all optional parameters for benchmarks.
Unify benchmark naming convention and support multiple tests with same benchmark and different parameters/options in one configuration file.
Support timeout to detect the benchmark failure and stop the process automatically.
Add rocm5.0 dockerfile.
Improve output interface.

Data Diagnosis and Analysis

Support multi-benchmark check.
Support result summary in md, html and excel formats.
Support data diagnosis in md and html formats.
Support result output for all nodes in data diagnosis.

Assets 2