Name		Name	Last commit message	Last commit date
parent directory ..
c_source		c_source
data		data
README.md		README.md
generated_op.py		generated_op.py
graph.py		graph.py
graph_tutorial.ipynb		graph_tutorial.ipynb
mnist.py		mnist.py
op_build.sh		op_build.sh
op_codegen.py		op_codegen.py
op_gen.cpp		op_gen.cpp
op_generator.py		op_generator.py
pass_manager.py		pass_manager.py

README.md

auto-deploy

Auto-deploy is a light-weighted NN auto-deployment tools. It involves auto op-fusion and auto op codegen, generating neural network deployment codes for specific targets.

Auto-deploy involves the following process:

read onnx model, parse into Graph IR
run graph optimization passes
dump graph to deployment codes
compile generated codes to executable file

Quick Start

env requirements:

python3 with onnx installed
autokernel docker with halide installed

python3 mnist.py
python3 op_generator.py

cd c_source
g++ *.cpp -o mnist
./mnist ../data/mnist.weights ../data/input_6.bin

run python3 mnist.py will generate main.cpp file in c_source directory

run python3 op_generator.py will compiler op_gen.cpp and generate the following codes:

|-- generated.h
|-- halide_conv.cpp
|-- halide_conv.h
|-- halide_matmul.cpp
|-- halide_matmul.h
|-- halide_maxpool.cpp
|-- halide_maxpool.h
|-- halide_relu.cpp
|-- halide_relu.h

finally, compile the source code and run, will print the output data, which is consistent with the result get in graph_tutorial.ipynb. with is output data, it will get predicted number is 6 after postprocessing.
```
2.797004 -12.441699 0.206829 -3.550967 0.014401 5.138205 17.518187 -16.953455 2.517180 -5.376605
```

Graph Tutorials

see graph_tutorial.ipynb

Pass Manager

Passes perform the transformations and optimizations for the graph. It may contain more than one passes for the graph. Intuitively, passes can be called by

graph = pass1(graph)
graph = pass2(graph)
graph = pass3(graph)

Pass manager is used for better passes management. It can reuse same pattern to generate different passes. In Pass manager:

register fusion pattern
add pass_func by reuse pattern
auto pass dependent analysis, generate seq_pass_list
according generated seq_pass_list, auto run all passed

Generated main.cpp

malloc all used tentors

    //data
    float* _0= (float*)malloc(sizeof(float)*784); //Input3
    float* _1= (float*)malloc(sizeof(float)*200); //Parameter5
    float* _2= (float*)malloc(sizeof(float)*8); //Parameter6
    float* _3= (float*)malloc(sizeof(float)*6272); //Plus30_Output_0
    float* _4= (float*)malloc(sizeof(float)*6272); //ReLU32_Output_0
    float* _5= (float*)malloc(sizeof(float)*1568); //Pooling66_Output_0
    float* _6= (float*)malloc(sizeof(float)*3200); //Parameter87
    float* _7= (float*)malloc(sizeof(float)*16); //Parameter88
    float* _8= (float*)malloc(sizeof(float)*3136); //Plus112_Output_0
    float* _9= (float*)malloc(sizeof(float)*3136); //ReLU114_Output_0
    float* _10= (float*)malloc(sizeof(float)*256); //Pooling160_Output_0
    float* _11= (float*)malloc(sizeof(float)*2560); //Parameter193_reshape1
    float* _12= (float*)malloc(sizeof(float)*10); //Parameter194
    float* _13= (float*)malloc(sizeof(float)*10); //Plus214_Output_0

load weights

    //load_weight
    FILE* fp = fopen(weight_name, "rb");
    if (!fp) printf("data can not be open");
    fread(_1, sizeof(float), 200, fp);
    fread(_2, sizeof(float), 8, fp);
    fread(_6, sizeof(float), 3200, fp);
    fread(_7, sizeof(float), 16, fp);
    fread(_11, sizeof(float), 2560, fp);
    fread(_12, sizeof(float), 10, fp);
    fclose(fp);

inference code

    //code_inference
    Conv_Add_fused(_3,_0,_1,_2,&param_0);
    Relu(_4,_3,&param_1);
    MaxPool(_5,_4,&param_2);
    Conv_Add_fused(_8,_5,_6,_7,&param_3);
    Relu(_9,_8,&param_4);
    MaxPool(_10,_9,&param_5);
    MatMul_Add_fused(_13,_10,_11,_12,&param_6);

Auto-deploy V.S. inference framenwork

	inference framework	auto-deploy
op fusion implements	- manual implementation of op fusion - hard to reuse fusion patterns	good for reuse op fusion patterns
op fusion space	limited numbers of op fusions	can extend to automatic op-fusion with model, bigger search space
op implementations	manual fused_op implementations for multi backends	auto codegen with autokernel for multi backends
deployment codes	light-weighted, only generated op needed in assigned neural networks	provide op library with all common op implementations

Release Note

2021/09 auto-deploy v1.0

graph core codes: tensor, node, graph ir
pass manager: op_fusion, remove reshape
nn demo: mnist onnx models/mnist
op: conv, add, relu, matmul, reshape
deployment main.cpp codegen

Future work

auto tensor memory scheduling
tmfile supports
more nn demo
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_deploy

auto_deploy

README.md

auto-deploy

Quick Start

Graph Tutorials

Pass Manager

Generated main.cpp

Auto-deploy V.S. inference framenwork

Release Note

2021/09 auto-deploy v1.0

Future work

Files

auto_deploy

Directory actions

More options

Directory actions

More options

Latest commit

History

auto_deploy

Folders and files

parent directory

README.md

auto-deploy

Quick Start

Graph Tutorials

Pass Manager

Generated main.cpp

Auto-deploy V.S. inference framenwork

Release Note

2021/09 auto-deploy v1.0

Future work