A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks |
---|---|---|---|---|
Intel-Caffe | CPU (Intel optimized) | Caffe | Y | Link |
NCNN | CPU (ARM optimized) | Caffe / pytorch / mxnet / onnx | Y | Link / unofficial Link |
FeatherCNN | CPU (ARM optimized) | Caffe | N | Link / unofficial Link |
FeatherCNNEx | CPU (ARM optimized) | Caffe | N | Link |
Tengine | CPU (ARM A72 optimized) | Caffe / mxnet | Y | Link |
Tensorflowlite | CPU (Android optimized) | Caffe2 / Tensorflow / onnx | Y | Link |
TensorRT | GPU (Volta optimized) | Caffe / Tensorflow / onnx | Y | Link |
TVM | CPU (ARM optimized) / Mali GPU / FPGA | onnx | Y | - |
SNPE | CPU (Qualcomm optimized) / GPU / DSP | Caffe / Caffe2 / Tensorflow/ onnx | Y | Link |
MACE | CPU (ARM optimized) / Mali GPU / DSP | Caffe / Tensorflow / onnx | Y | Link |
Easy-MACE | CPU (ARM optimized) / CPU (x86 optimized) | Caffe / Tensorflow / onnx | Y | - |
In-Prestissimo | CPU (ARM optimized) | Caffe | N | Link |
Paddle-Mobile | CPU (ARM optimized) / Mali GPU / FPGA | Paddle / Caffe / onnx | Y | - |
Anakin | CPU (ARM optimized) / GPU / CPU (x86 optimized) | Caffe / Fluid | Y | Link |
Pocket-Tensor | CPU (ARM/x86 optimized) | Keras | N | Link |
ZQCNN | CPU | Caffe / mxnet | Y | Link |
ARM-NEON-to-x86-SSE | CPU (Intel optimized) | Intrinsics-Level | - | - |
Simd | CPU (all platform optimized) | Intrinsics-Level | - | - |
clDNN | Intel® Processor Graphics / Iris™ Pro Graphics | Caffe / Tennsorflow / mxnet / onnx | Y | Link |
Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks |
---|---|---|---|---|
FeatherCNNEx | CPU (ARM optimized) | Caffe | N | Link |
ARM32-SGEMM-LIB | CPU (ARM optimized) | GEMM Library | N | Link |
Yolov2-Xilinx-PYNQ | FPGA (Xilinx PYNQ) | Yolov2-only | Y | Link |
Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks |
---|---|---|---|---|
Intel-Caffe | CPU (Intel Skylake) | Caffe | Y | Link |
NCNN | CPU (ARM) | Caffe / pytorch / mxnet / onnx | Y | Link |
Tensorflowlite | CPU (Android) | Caffe2 / Tensorflow / onnx | Y | Link |
TensorRT | GPU (Volta) | Caffe / Tensorflow / onnx | Y | Link |
Gemmlowp | CPU (ARM / x86) | GEMM Library | - | - |
SNPE | DSP (Quantized DLC) | Caffe / Caffe2 / Tensorflow/ onnx | Y | Link |
MACE | CPU (ARM optimized) / Mali GPU / DSP | Caffe / Tensorflow / onnx | Y | Link |
In-Prestissimo | CPU (ARM optimized) | Caffe | N | Link |
Paddle-Mobile | CPU (ARM optimized) / Mali GPU / FPGA | Paddle / Caffe / onnx | Y | - |
Anakin | CPU (ARM optimized) / GPU / CPU (x86 optimized) | Caffe / Fluid | Y | Link |
Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks |
---|---|---|---|---|
Gemmbitserial | CPU (ARM / x86) | GEMM Library | - | Link |
Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks |
---|---|---|---|---|
BMXNET | CPU (ARM / x86) / GPU | mxnet | Y | Link |
Espresso | GPU | - | N | Link |
BNN-PYNQ | FPGA (Xilinx PYNQ) | - | N | Link |
FINN | FPGA (Xilinx) | - | N | Link |
Rockchip RK3399 (Cortex-A72 1.8GHz x 2 + Cortex-A53 1.5GHz x 4):
Framework (ms) | 1 Thread | 2 Threads | 3 Threads | 4 Threads |
---|---|---|---|---|
Caffe+OpenBLAS* |
250.57 | 204.40 | 248.65 | 230.20 |
FeatherCNN | 205.76 | 135.17 | 183.34 | 194.67 |
NCNN** |
150.95 | 90.79 | 232.31 | 231.64 |
NCNN-Opt | 122.22 | 67.47 | - | - |
Tengine | 122.10 | 65.42 | - | - |
Tengine-Opt | 115.29 | 63.94 | - | - |
*
: optimized for Cortex-A53 instead of Cortex-A72
**
: powersave=0
For 1 Thread, we set task on a single A72, and A72 x 2 for 2 Threads.
Framework (ms) | 1 Thread | 2 Threads | 8 Threads |
---|---|---|---|
NCNN* |
340.33 | 211.78 | - |
NCNN-Opt | 332.20 | 206.62 | 196.97 |
Tengine | 402.57 | 226.02 | - |
*
: Conv-BN-Scale-fused