Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
benchmark		benchmark
data		data
onnx_tool		onnx_tool
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

简体中文

onnx-tool

A tool for ONNX model:

Rapid shape inference.
Profile model.
Constant Folding.
Compute Graph and Shape Engine.
OPs fusion.
Activation memory compression.
Quantized models and sparse models are supported.

Supported Models:

NLP: BERT, T5, GPT, LLaMa, MPT(TransformerModel)
Diffusion: Stable Diffusion(TextEncoder, VAE, UNET)
CV: BEVFormer, MobileNet, YOLO, ...
Audio: sovits, LPCNet

Shape inference

how to use: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow usage: data/TensorflowUsage.md.
samples: benchmark/samples.py.

Profile Model

Float MultipleAdd Count(1 MAC=2 FLOPs), Memory Usage(in bytes), Parameters(elements number)

Sparse Pattern, Sparse Block Ratio, Sparse Element Ratio

how to use: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow usage: data/TensorflowUsage.md.
samples: benchmark/samples.py.

Compute Graph with Shape Engine

Remove shape calculation layers(created by ONNX export) to get a Compute Graph. Use Shape Engine to update tensor shapes at runtime.
Samples: benchmark/shape_regress.py. benchmark/samples.py.
Integrate Compute Graph and Shape Engine into a cpp inference engine: data/inference_engine.md

Inplace op fusion

MHA and Layernorm Fusion for Transformers

Resnet18 fusion

how to use: data/Subgraph.md.
BERT samples: benchmark/samples.py.
Pattern fusion: benchmark/do_fusion.py.

Extract subgraph from ONNX model

Help implement model parallelism.

how to use: data/Subgraph.md.

Memory Compression

For large language models and high-resolution CV models, the activation memory compression is a key to save memory.
The compression method achieves 5% memory compression on most models.
For example:

model	Native Memory Size(MB)	Compressed Memory Size(MB)	Compression Ratio(%)
StableDiffusion(VAE_encoder)	14,245	540	3.7
StableDiffusion(VAE_decoder)	25,417	1,140	4.48
StableDiffusion(Text_encoder)	215	5	2.5
StableDiffusion(UNet)	36,135	2,232	6.2
GPT2	40	2	6.9
BERT	2,170	27	1.25

code sample: benchmark/compression.py

Tensor operations

Export weight tensors to files
Simplify tensor and node names, convert name from a long string to a short string
Remove unused tensors, models like vgg19-7.onnx set its static weight tensors as its input tensors
Set custom input and output tensors' name and dimension, change model from fixed input to dynamic input
how to use: data/Tensors.md.

How to install

pip install onnx-tool

OR

pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git

python>=3.6

If pip install onnx-tool failed by onnx's installation, you may try pip install onnx==1.8.1 (a lower version like this) first.
Then pip install onnx-tool again.

Known Issues

Loop op is not supported
Activation Compression is not optimum

Results of ONNX Model Zoo and SOTA models

Some models have dynamic input shapes. The MACs varies from input shapes. The input shapes used in these results are writen to data/public/config.py. These onnx models with all tensors' shape can be downloaded: baidu drive(code: p91k) google drive

Model	Params(M)	MACs(M)
GPT-J 1 layer	464	173,398
MPT 1 layer	261	79,894
text_encoder	123.13	6,782
UNet2DCondition	859.52	888,870
VAE_encoder	34.16	566,371
VAE_decoder	49.49	1,271,959
SqueezeNet 1.0	1.23	351
AlexNet	60.96	665
GoogleNet	6.99	1,606
googlenet_age	5.98	1,605
LResNet100E-IR	65.22	12,102
BERT-Squad	113.61	22,767
BiDAF	18.08	9.87
EfficientNet-Lite4	12.96	1,361
Emotion	12.95	877
Mask R-CNN	46.77	92,077

Model	Params(M)	MACs(M)
LLaMa 1 layer	618	211,801
BEVFormer Tiny	33.7	210,838
rvm_mobilenetv3	3.73	4,289
yolov4	64.33	3,319
ConvNeXt-L	229.79	34,872
edgenext_small	5.58	1,357
SSD	19.98	216,598
RealESRGAN	16.69	73,551
ShuffleNet	2.29	146
GPT-2	137.02	1,103
T5-encoder	109.62	686
T5-decoder	162.62	1,113
RoBERTa-BASE	124.64	688
Faster R-CNN	44.10	46,018
FCN ResNet-50	35.29	37,056
ResNet50	25	3,868

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

onnx-tool

Shape inference

Profile Model

Compute Graph with Shape Engine

Inplace op fusion

Extract subgraph from ONNX model

Memory Compression

Tensor operations

How to install

Known Issues

Results of ONNX Model Zoo and SOTA models

About

Releases

Packages

Languages

License

EnnSou/onnx-tool

Folders and files

Latest commit

History

Repository files navigation

onnx-tool

Shape inference

Profile Model

Compute Graph with Shape Engine

Inplace op fusion

Extract subgraph from ONNX model

Memory Compression

Tensor operations

How to install

Known Issues

Results of ONNX Model Zoo and SOTA models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages