Skip to content
/ Forward Public
forked from Tencent/Forward

a library for high performance deep learning inference on NVIDIA GPUs.

License

Notifications You must be signed in to change notification settings

lujq96/Forward

 
 

Repository files navigation

Forward - A library for high performance deep learning inference on NVIDIA GPUs

License



[中文版]

Forward

Forward is a library for high performance deep learning inference on NVIDIA GPUs. It provides a well-designed scheme that directly parse Tensorflow/PyTorch/Keras models to high-performance engine based on TensorRT. Compared to TensorRT, it is easy-to-use and easy-to-expand. So far, Forward supports not only mainstream deep learning models in CV, NLP and Recommend fields, but also some advanced models such as BERT, GAN, FaceSwap, StyleTransfer.

Features

  • Utilize TensorRT API and customized operators for high-performance deep learning inference.
  • Support not only mainstream deep learning models in CV, NLP and Recommend fields, but also advanced models such as BERT, GAN, FaceSwap, StyleTransfer.
  • Support FLOAT/HALF/INT8 infer modes.
  • Easy to use: Load directly Tensorflow(.pb)/PyTorch(.pth)/Keras(.h5) models and then do inference with TensorRT.
  • Easy to expand: Register customized layers refer to add_support_op.md.
  • Provide C++ and Python interfaces.

Quick Start

Prerequisites

  • NVIDIA CUDA >= 10.0, CuDNN >= 7 (Recommended version: CUDA 10.2 )
  • TensorRT >= 6.0.1.5, (Recommended version: TensorRT-7.2.1.6)
  • CMake >= 3.10.1
  • GCC >= 5.4.0, ld >= 2.26.1
  • (Pytorch) pytorch == 1.3.1
  • (Tensorflow) TensorFlow == 1.15.0 (download Tensorflow 1.15.0 and unzip it to source/third_party/tensorflow/lib)
  • (Keras) HDF 5

Build with CMake

Generate Makefiles or VS project (Windows) and build. Forward can be built for different framework, such as Fwd-Torch, Fwd-Python-Torch, Fwd-Tf, Fwd-Python-Tf, Fwd-Keras, Fwd-Python-Keras, which controlled by CMake options. For example, Fwd-Python-Tf is built as below.

mkdir build
cd build

cmake ..  \
-DTensorRT_ROOT=/path/to/TensorRT \ 
-DENABLE_LOGGING=ON \  
-DENABLE_PROFILING=ON \  
-DENABLE_DYNAMIC_BATCH=ON \ 
-DBUILD_PTYHON_LIB=ON \
-DENABLE_TORCH=OFF \  
-DENABLE_TENSORFLOW=ON \ 
-DENABLE_KERAS=OFF \ 

make -j

CMake build arguments

  • TensorRT_ROOT [Required]: Path to the TensorRT installation directory containing libraries
  • More CMake options refer to CMake Options

Unit Test

When the project is built, unit_test can be used to verify the project is successfully built.

cd build/bin
./unit_test --gtest_filter=TestTfNodes.*

Use Forward-Python

When the project is successfully built, the Forward-Python library can be found in the build/bin directory, named as forward.cpython.xxx*.so in Linux or forward.xxx*.pyd in Windows. Forward-Python library should be copied to the workspace directory of Python project. For example, the directory is organized as:

---- workspace
   |
   -- test.py
   |
   -- forward.cpython.xxx*.so

Then, test.py can import Forward to perform high performance deep learning inference.

# test.py

import forward
import numpy as np

# 1. BUILD step: load TensorFlow-Bert model to build Forward engine
builder = forward.TfBuilder()
batch_size = 16
infer_mode = 'float32'  # Infer mode: 'float32' / 'float16' / 'int8_calib' / 'int8'

# dict_type dummy input
dummy_input = {"input_ids" : np.ones([batch_size , 48], dtype='int32'), 
               "input_mask" : np.ones([batch_size , 48], dtype='int32'),
               "segment_ids" : np.ones([batch_size , 48], dtype='int32')}

# build engine
builder.set_mode(infer_mode); # optional, 'float32' is default.
model_path = 'bert_model.pb'
tf_engine = builder.build(model_path, dummy_input)

need_save = True
if need_save:
    # save engine
    engine_path = 'path/to/out/engine'
    tf_engine.save(engine_path)

    # load saved engine
    tf_engine = forward.TfEngine()
    tf_engine.load(engine_path)

# 2. FORWARD step: do inference with Forward engine
inputs = dummy_input
outputs = tf_engine.forward(inputs) # dict_type outputs

Notice: The name of INPUT in models can be viewed by model viewers, such as Netron.

More Usages

FAQ

FAQ

Models & Operators

Models

Operators

Contribution

CONTRIBUTING

License

Apache License v2.0

About

a library for high performance deep learning inference on NVIDIA GPUs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 94.1%
  • Jupyter Notebook 4.1%
  • Cuda 0.8%
  • Python 0.6%
  • CMake 0.4%
  • C 0.0%