Auto-deploy is a light-weighted NN auto-deployment tools. It involves auto op-fusion and auto op codegen, generating neural network deployment codes for specific targets.
Auto-deploy involves the following process:
- read onnx model, parse into Graph IR
- run graph optimization passes
- dump graph to deployment codes
- compile generated codes to executable file
env requirements:
- python3 with onnx installed
- autokernel docker with halide installed
python3 mnist.py
python3 op_generator.py
cd c_source
g++ *.cpp -o mnist
./mnist ../data/mnist.weights ../data/input_6.bin
- run
python3 mnist.py
will generatemain.cpp
file in c_source directory - run
python3 op_generator.py
will compilerop_gen.cpp
and generate the following codes:|-- generated.h |-- halide_conv.cpp |-- halide_conv.h |-- halide_matmul.cpp |-- halide_matmul.h |-- halide_maxpool.cpp |-- halide_maxpool.h |-- halide_relu.cpp |-- halide_relu.h
- finally, compile the source code and run, will print the output data, which is consistent with the result get in
graph_tutorial.ipynb
. with is output data, it will getpredicted number is 6
after postprocessing.2.797004 -12.441699 0.206829 -3.550967 0.014401 5.138205 17.518187 -16.953455 2.517180 -5.376605
see graph_tutorial.ipynb
Passes perform the transformations and optimizations for the graph. It may contain more than one passes for the graph. Intuitively, passes can be called by
graph = pass1(graph)
graph = pass2(graph)
graph = pass3(graph)
Pass manager is used for better passes management. It can reuse same pattern to generate different passes. In Pass manager:
- register fusion pattern
- add pass_func by reuse pattern
- auto pass dependent analysis, generate seq_pass_list
- according generated seq_pass_list, auto run all passed
- malloc all used tentors
//data
float* _0= (float*)malloc(sizeof(float)*784); //Input3
float* _1= (float*)malloc(sizeof(float)*200); //Parameter5
float* _2= (float*)malloc(sizeof(float)*8); //Parameter6
float* _3= (float*)malloc(sizeof(float)*6272); //Plus30_Output_0
float* _4= (float*)malloc(sizeof(float)*6272); //ReLU32_Output_0
float* _5= (float*)malloc(sizeof(float)*1568); //Pooling66_Output_0
float* _6= (float*)malloc(sizeof(float)*3200); //Parameter87
float* _7= (float*)malloc(sizeof(float)*16); //Parameter88
float* _8= (float*)malloc(sizeof(float)*3136); //Plus112_Output_0
float* _9= (float*)malloc(sizeof(float)*3136); //ReLU114_Output_0
float* _10= (float*)malloc(sizeof(float)*256); //Pooling160_Output_0
float* _11= (float*)malloc(sizeof(float)*2560); //Parameter193_reshape1
float* _12= (float*)malloc(sizeof(float)*10); //Parameter194
float* _13= (float*)malloc(sizeof(float)*10); //Plus214_Output_0
- load weights
//load_weight
FILE* fp = fopen(weight_name, "rb");
if (!fp) printf("data can not be open");
fread(_1, sizeof(float), 200, fp);
fread(_2, sizeof(float), 8, fp);
fread(_6, sizeof(float), 3200, fp);
fread(_7, sizeof(float), 16, fp);
fread(_11, sizeof(float), 2560, fp);
fread(_12, sizeof(float), 10, fp);
fclose(fp);
- inference code
//code_inference
Conv_Add_fused(_3,_0,_1,_2,¶m_0);
Relu(_4,_3,¶m_1);
MaxPool(_5,_4,¶m_2);
Conv_Add_fused(_8,_5,_6,_7,¶m_3);
Relu(_9,_8,¶m_4);
MaxPool(_10,_9,¶m_5);
MatMul_Add_fused(_13,_10,_11,_12,¶m_6);
inference framework | auto-deploy | |
---|---|---|
op fusion implements | - manual implementation of op fusion - hard to reuse fusion patterns |
good for reuse op fusion patterns |
op fusion space | limited numbers of op fusions | can extend to automatic op-fusion with model, bigger search space |
op implementations | manual fused_op implementations for multi backends | auto codegen with autokernel for multi backends |
deployment codes | light-weighted, only generated op needed in assigned neural networks | provide op library with all common op implementations |
- graph core codes: tensor, node, graph ir
- pass manager: op_fusion, remove reshape
- nn demo: mnist onnx models/mnist
- op: conv, add, relu, matmul, reshape
- deployment main.cpp codegen
- auto tensor memory scheduling
- tmfile supports
- more nn demo
- ...