Merge pull request #418 from ildoonet/dev/architecture-mobilenet-v2

Dev/architecture mobilenet v2
nrobin · Mar 14, 2019 · fa03679 · fa03679
2 parents dbc89c7 + d1f2d76
commit fa03679
Show file tree

Hide file tree

Showing 126 changed files with 532 additions and 95,044 deletions.
diff --git a/README.md b/README.md
@@ -1,13 +1,12 @@
 # tf-pose-estimation
 
-'Openpose' for human pose estimation have been implemented using Tensorflow. It also provides several variants that have made some changes to the network structure for **real-time processing on the CPU or low-power embedded devices.**
+'Openpose', human pose estimation algorithm, have been implemented using Tensorflow. It also provides several variants that have some changes to the network structure for **real-time processing on the CPU or low-power embedded devices.**
 
-
-**You can even run this on your macbook with descent FPS!**
+**You can even run this on your macbook with a descent FPS!**
 
 Original Repo(Caffe) : https://github.com/CMU-Perceptual-Computing-Lab/openpose
 
-| CMU's Original Model</br> on Macbook Pro 15" | Mobilenet Variant </br>on Macbook Pro 15" | Mobilenet Variant</br>on Jetson TX2 |
+| CMU's Original Model</br> on Macbook Pro 15" | Mobilenet-thin </br>on Macbook Pro 15" | Mobilenet-thin</br>on Jetson TX2 |
 |:---------|:--------------------|:----------------|
 | ![cmu-model](/etcs/openpose_macbook_cmu.gif)     | ![mb-model-macbook](/etcs/openpose_macbook_mobilenet3.gif) | ![mb-model-tx2](/etcs/openpose_tx2_mobilenet3.gif) |
 | **~0.6 FPS** | **~4.2 FPS** @ 368x368 | **~10 FPS** @ 368x368 |
@@ -17,8 +16,9 @@ Implemented features are listed here : [features](./etcs/feature.md)
 
 ## Important Updates
 
-2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: https://github.com/ildoonet/tf-pose-estimation/tree/master/src/pafprocess
-2018.2.7 Arguments in run.py script changed. Support dynamic input size.
+- 2019.3.12 Add new models using mobilenet-v2 architecture. See : [experiments.md](./etc/experiments.md)
+- 2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: https://github.com/ildoonet/tf-pose-estimation/tree/master/src/pafprocess
+- 2018.2.7 Arguments in run.py script changed. Support dynamic input size.
 
 ## Install
 
@@ -29,10 +29,6 @@ You need dependencies below.
 - python3
 - tensorflow 1.4.1+
 - opencv3, protobuf, python3-tk
-
-### Opensources
-
-- slim
 - slidingwindow
   - https://github.com/adamrehn/slidingwindow
   - I copied from the above git repo to modify few things.
@@ -42,8 +38,8 @@ You need dependencies below.
 Clone the repo and install 3rd-party libraries.
 
 ```bash
-$ git clone https://www.github.com/ildoonet/tf-openpose
-$ cd tf-openpose
+$ git clone https://www.github.com/ildoonet/tf-pose-estimation
+$ cd tf-pose-estimation
 $ pip3 install -r requirements.txt
 ```
 
@@ -58,47 +54,23 @@ $ swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace
 Alternatively, you can install this repo as a shared package using pip.
 
 ```bash
-$ git clone https://www.github.com/ildoonet/tf-openpose
+$ git clone https://www.github.com/ildoonet/tf-pose-estimation
 $ cd tf-openpose
 $ python setup.py install
 ```
 
-#### Test installed package
-![package_install_result](./etcs/imgcat0.gif)
-```bash
-python -c 'import tf_pose; tf_pose.infer(image="./images/p1.jpg")'
-```
-
+## Models & Performances
 
-## Models
-
-I have tried multiple variations of models to find optmized network architecture. Some of them are below and checkpoint files are provided for research purpose. 
-
-- cmu 
-  - the model based VGG pretrained network which described in the original paper.
-  - I converted Weights in Caffe format to use in tensorflow.
-  - [pretrained weight download](https://www.dropbox.com/s/xh5s7sb7remu8tx/openpose_coco.npy?dl=0)
-
-- dsconv
-  - Same architecture as the cmu version except for the **depthwise separable convolution** of mobilenet.
-  - I trained it using 'transfer learning', but it provides not-enough speed and accuracy.
-
-- mobilenet
-  - Based on the mobilenet paper, 12 convolutional layers are used as feature-extraction layers.
-  - To improve on small person, **minor modification** on the architecture have been made.
-  - Three models were learned according to network size parameters.
-    - mobilenet
-      - 368x368 : [checkpoint weight download](https://www.dropbox.com/s/09xivpuboecge56/mobilenet_0.75_0.50_model-388003.zip?dl=0)
-    - mobilenet_fast
-    - mobilenet_accurate
-  - I published models which is not the best ones, but you can test them before you trained a model from the scratch.
+See [experiments.md](./etc/experiments.md)
 
 ### Download Tensorflow Graph File(pb file)
 
 Before running demo, you should download graph files. You can deploy this graph on your mobile or other platforms.
 
 - cmu (trained in 656x368)
 - mobilenet_thin (trained in 432x368)
+- mobilenet_v2_large (trained in 432x368)
+- mobilenet_v2_small (trained in 432x368)
 
 CMU's model graphs are too large for git, so I uploaded them on an external cloud. You should download them if you want to use cmu's original model. Download scripts are provided in the model folder.
 
@@ -107,16 +79,6 @@ $ cd models/graph/cmu
 $ bash download.sh
 ```
 
-### Inference Time
-
-| Dataset | Model              | Inference Time<br/>Macbook Pro i5 3.1G | Inference Time<br/>Jetson TX2  |
-|---------|--------------------|----------------:|----------------:|
-| Coco    | cmu                | 10.0s @ 368x368 | OOM   @ 368x368<br/> 5.5s  @ 320x240|
-| Coco    | dsconv             | 1.10s @ 368x368 |
-| Coco    | mobilenet_accurate | 0.40s @ 368x368 | 0.18s @ 368x368 |
-| Coco    | mobilenet          | 0.24s @ 368x368 | 0.10s @ 368x368 |
-| Coco    | mobilenet_fast     | 0.16s @ 368x368 | 0.07s @ 368x368 |
-
 ## Demo
 
 ### Test Inference
@@ -166,36 +128,4 @@ See : [etcs/training.md](./etcs/training.md)
 
 ## References
 
-### OpenPose
-
-[1] https://github.com/CMU-Perceptual-Computing-Lab/openpose
-
-[2] Training Codes : https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation
-
-[3] Custom Caffe by Openpose : https://github.com/CMU-Perceptual-Computing-Lab/caffe_train
-
-[4] Keras Openpose : https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation
-
-[5] Keras Openpose2 : https://github.com/kevinlin311tw/keras-openpose-reproduce
-
-### Lifting from the deep
-
-[1] Arxiv Paper : https://arxiv.org/abs/1701.00295
-
-[2] https://github.com/DenisTome/Lifting-from-the-Deep-release
-
-### Mobilenet
-
-[1] Original Paper : https://arxiv.org/abs/1704.04861
-
-[2] Pretrained model : https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md
-
-### Libraries
-
-[1] Tensorpack : https://github.com/ppwwyyxx/tensorpack
-
-### Tensorflow Tips
-
-[1] Freeze graph : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
-
-[2] Optimize graph : https://codelabs.developers.google.com/codelabs/tensorflow-for-poets-2
+See : [etcs/reference.md](./etcs/reference.md)
diff --git a/etcs/experiments.md b/etcs/experiments.md
@@ -1,12 +1,59 @@
+# Trained Models & Performances
 
+## Models
 
-## COCO Datasets
+I have tried multiple variations of models to find optmized network architecture. Some of them are below and checkpoint files are provided for research purpose.
 
-| Set         | Model          | Scale      | Resolution | AP         | AP 50      | AP 75      | AP medium  | AP large   | AR         | AR 50      | AR 75      | AR medium  | AR large   |
-|-------------|----------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|
-| 2014 Val    | Original Paper | 3          | Image      |      0.584 |      0.815 |      0.626 |      0.544 |      0.651 |            |            |            |            |            |
-| | | | | | | | | | | | | |
-| 2014 Val    | CMU            | 1          | Image      |    0.5067 |     0.7660 |     0.5377 |     0.4927 |     0.5309 |     0.5614 |     0.7900 |     0.5903 |     0.5089 |     0.6347 |
-| 2014 Val    | Mobilenet thin | 1          | Image      |    0.2806 |     0.5577 |     0.2474 |     0.2802 |     0.2843 |     0.3214 |     0.5840 |     0.2997 |     0.2946 |     0.3587 |
+- cmu
+  - the model based VGG pretrained network which described in the original paper.
+  - I converted Weights in Caffe format to use in tensorflow.
+  - [pretrained weight download](https://www.dropbox.com/s/xh5s7sb7remu8tx/openpose_coco.npy?dl=0)
+
+- dsconv
+  - Same architecture as the cmu version except for the **depthwise separable convolution** of mobilenet.
+  - I trained it using 'transfer learning', but it provides not-enough speed and accuracy.
+
+- mobilenet
+  - Based on the mobilenet paper, 12 convolutional layers are used as feature-extraction layers.
+  - To improve on small person, **minor modification** on the architecture have been made.
+  - Three models were learned according to network size parameters.
+    - mobilenet
+      - 368x368 : [checkpoint weight download](https://www.dropbox.com/s/09xivpuboecge56/mobilenet_0.75_0.50_model-388003.zip?dl=0)
+    - mobilenet_fast
+    - mobilenet_accurate
+  - I published models which is not the best ones, but you can test them before you trained a model from the scratch.
+
+- mobilenet v2
+  - Similar to mobilenet, but using improved version of it.
+
+| Name                 | Feature Layers      | Configuration                   |
+|----------------------|---------------------|---------------------------------|
+| cmu                  | VGG16               | OpenPose                        |
+| mobilenet_thin       | Mobilenet           | width=0.75 refine-width=0.75    |
+| mobilenet_v2_large   | Mobilenet v2 (582M) | width=1.40 refine-width=1.00    |
+| mobilenet_v2_small   | Mobilenet v2 (97M)  | width=0.50 refine-width=0.50    |
+
+## Performance on COCO Datasets
 
+| Set         | Model               | Scale | Resolution | AP         | AP 50      | AP 75      | AP medium  | AP large   | AR         | AR 50      | AR 75      | AR medium  | AR large   |
+|-------------|---------------------|-------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|
+| 2014 Val    | Original Paper      | 3     | Image      |      0.584 |      0.815 |      0.626 |      0.544 |      0.651 |            |            |            |            |            |
+| | | | | | | | | | | | | |
+| 2014 Val    | CMU(openpose)       | 1     | Image      |     0.5067 |     0.7660 |     0.5377 |     0.4927 |     0.5309 |     0.5614 |     0.7900 |     0.5903 |     0.5089 |     0.6347 |
+| 2014 Val    | VGG(openpose, our)  | 1     | Image      |     0.5067 |     0.7660 |     0.5377 |     0.4927 |     0.5309 |     0.5614 |     0.7900 |     0.5903 |     0.5089 |     0.6347 |
+| | | | | | | | | | | | | |
+| 2014 Val    | Mobilenet thin      | 1     | Image      |     0.2806 |     0.5577 |     0.2474 |     0.2802 |     0.2843 |     0.3214 |     0.5840 |     0.2997 |     0.2946 |     0.3587 |
+| 2014 Val    | Mobilenet-v2 Large  | 1     | Image      |     0.3130 |     0.5846 |     0.2940 |     0.2622 |     0.3850 |     0.3680 |     0.6101 |     0.3637 |     0.2765 |     0.4912 |
+| 2014 Val    | Mobilenet-v2 Small  | 1     | Image      |     0.1730 |     0.4062 |     0.1240 |     0.1501 |     0.2105 |     0.2207 |     0.4505 |     0.1876 |     0.1601 |     0.3020 |
 I also ran keras & caffe models to verify single-scale version's performance, they matched this result.
+
+## Computation Budget & Latency
+
+| Model               | mAP@COCO2014 | GFLOPs | Latency(432x368)<br/>(Macbook 15' 2.9GHz i9, tf 1.12) | Latency(432x368)<br/>(V100 GPU) |
+|---------------------|-------------:|--------|------------------------------------------------------:|-------------------------------:|
+| CMU, VGG(OpenPose)  |              |        | 0.8589s | 0.0570s |
+| Mobilenet thin      | 0.2806       |        | 0.1701s | 0.0217s |
+| Mobilenet-v2 Large  | 0.3130       |        | 0.2066s | 0.0214s |
+| Mobilenet-v2 Small  | 0.1730       |        | 0.1290s | 0.0210s |
+
+Optimized Tensorflow was built before run this experiment. This may varies between environments, images and other factors.
diff --git a/etcs/reference.md b/etcs/reference.md
@@ -0,0 +1,35 @@
+## Reference
+
+### OpenPose
+
+[1] https://github.com/CMU-Perceptual-Computing-Lab/openpose
+
+[2] Training Codes : https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation
+
+[3] Custom Caffe by Openpose : https://github.com/CMU-Perceptual-Computing-Lab/caffe_train
+
+[4] Keras Openpose : https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation
+
+[5] Keras Openpose2 : https://github.com/kevinlin311tw/keras-openpose-reproduce
+
+### Mobilenet
+
+[1] Original Paper : https://arxiv.org/abs/1704.04861
+
+[2] Pretrained model : https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md
+
+[3] Mobilenet v2 Paper : https://arxiv.org/abs/1801.04381
+
+[4] Pretrained Model(v2) : https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
+
+### Libraries
+
+[1] Tensorpack : https://github.com/ppwwyyxx/tensorpack
+
+### Tensorflow Tips
+
+[1] Freeze graph : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
+
+[2] Optimize graph : https://codelabs.developers.google.com/codelabs/tensorflow-for-poets-2
+
+[3] Calculate FLOPs : https://stackoverflow.com/questions/45085938/tensorflow-is-there-a-way-to-measure-flops-for-a-model
diff --git a/etcs/training.md b/etcs/training.md
@@ -94,16 +94,16 @@ And the optimization can be performed on the frozen model via graph transform pr
 ```bash
 $ bazel build tensorflow/tools/graph_transforms:transform_graph
 $ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
-    --in_graph=... \
-    --out_graph=... \
+    --in_graph=./tmp/graph_frozen.pb \
+    --out_graph=./tmp/graph_opt.pb \
     --inputs='image:0' \
     --outputs='Openpose/concat_stage7:0' \
     --transforms='
     strip_unused_nodes(type=float, shape="1,368,368,3")
-    remove_nodes(op=Identity, op=CheckNumerics)
-    fold_constants(ignoreError=False)
     fold_old_batch_norms
-    fold_batch_norms'
+    fold_batch_norms
+    fold_constants(ignoreError=False)
+    remove_nodes(op=Identity, op=CheckNumerics)'
 ```
 
 Also, It is promising to quantize neural network in 8 bit to get futher improvement for speed. In my case, this will make inference less accurate and take more time on Intel's CPUs.

diff --git a/models/graph/mobilenet_v2_large/graph_opt.pb b/models/graph/mobilenet_v2_large/graph_opt.pb
diff --git a/models/graph/mobilenet_v2_small/graph_opt.pb b/models/graph/mobilenet_v2_small/graph_opt.pb
diff --git a/requirements.txt b/requirements.txt
@@ -9,3 +9,4 @@ scipy
 slidingwindow
 tqdm
 git+https://github.com/ppwwyyxx/tensorpack.git
+numba
diff --git a/run.py b/run.py
@@ -9,7 +9,8 @@
 from tf_pose.estimator import TfPoseEstimator
 from tf_pose.networks import get_graph_path, model_wh
 
-logger = logging.getLogger('TfPoseEstimator')
+logger = logging.getLogger('TfPoseEstimatorRun')
+logger.handlers.clear()
 logger.setLevel(logging.DEBUG)
 ch = logging.StreamHandler()
 ch.setLevel(logging.DEBUG)
@@ -21,10 +22,11 @@
 if __name__ == '__main__':
     parser = argparse.ArgumentParser(description='tf-pose-estimation run')
     parser.add_argument('--image', type=str, default='./images/p1.jpg')
-    parser.add_argument('--model', type=str, default='cmu', help='cmu / mobilenet_thin')
-
+    parser.add_argument('--model', type=str, default='cmu',
+                        help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small')
     parser.add_argument('--resize', type=str, default='0x0',
-                        help='if provided, resize images before they are processed. default=0x0, Recommends : 432x368 or 656x368 or 1312x736 ')
+                        help='if provided, resize images before they are processed. '
+                             'default=0x0, Recommends : 432x368 or 656x368 or 1312x736 ')
     parser.add_argument('--resize-out-ratio', type=float, default=4.0,
                         help='if provided, resize heatmaps before they are post-processed. default=1.0')
 
@@ -41,6 +43,7 @@
     if image is None:
         logger.error('Image can not be read, path=%s' % args.image)
         sys.exit(-1)
+
     t = time.time()
     humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
     elapsed = time.time() - t
@@ -49,36 +52,41 @@
 
     image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
 
-    import matplotlib.pyplot as plt
-
-    fig = plt.figure()
-    a = fig.add_subplot(2, 2, 1)
-    a.set_title('Result')
-    plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
-
-    bgimg = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)
-    bgimg = cv2.resize(bgimg, (e.heatMat.shape[1], e.heatMat.shape[0]), interpolation=cv2.INTER_AREA)
-
-    # show network output
-    a = fig.add_subplot(2, 2, 2)
-    plt.imshow(bgimg, alpha=0.5)
-    tmp = np.amax(e.heatMat[:, :, :-1], axis=2)
-    plt.imshow(tmp, cmap=plt.cm.gray, alpha=0.5)
-    plt.colorbar()
-
-    tmp2 = e.pafMat.transpose((2, 0, 1))
-    tmp2_odd = np.amax(np.absolute(tmp2[::2, :, :]), axis=0)
-    tmp2_even = np.amax(np.absolute(tmp2[1::2, :, :]), axis=0)
-
-    a = fig.add_subplot(2, 2, 3)
-    a.set_title('Vectormap-x')
-    # plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
-    plt.imshow(tmp2_odd, cmap=plt.cm.gray, alpha=0.5)
-    plt.colorbar()
-
-    a = fig.add_subplot(2, 2, 4)
-    a.set_title('Vectormap-y')
-    # plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
-    plt.imshow(tmp2_even, cmap=plt.cm.gray, alpha=0.5)
-    plt.colorbar()
-    plt.show()
+    try:
+        import matplotlib.pyplot as plt
+
+        fig = plt.figure()
+        a = fig.add_subplot(2, 2, 1)
+        a.set_title('Result')
+        plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
+
+        bgimg = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)
+        bgimg = cv2.resize(bgimg, (e.heatMat.shape[1], e.heatMat.shape[0]), interpolation=cv2.INTER_AREA)
+
+        # show network output
+        a = fig.add_subplot(2, 2, 2)
+        plt.imshow(bgimg, alpha=0.5)
+        tmp = np.amax(e.heatMat[:, :, :-1], axis=2)
+        plt.imshow(tmp, cmap=plt.cm.gray, alpha=0.5)
+        plt.colorbar()
+
+        tmp2 = e.pafMat.transpose((2, 0, 1))
+        tmp2_odd = np.amax(np.absolute(tmp2[::2, :, :]), axis=0)
+        tmp2_even = np.amax(np.absolute(tmp2[1::2, :, :]), axis=0)
+
+        a = fig.add_subplot(2, 2, 3)
+        a.set_title('Vectormap-x')
+        # plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
+        plt.imshow(tmp2_odd, cmap=plt.cm.gray, alpha=0.5)
+        plt.colorbar()
+
+        a = fig.add_subplot(2, 2, 4)
+        a.set_title('Vectormap-y')
+        # plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
+        plt.imshow(tmp2_even, cmap=plt.cm.gray, alpha=0.5)
+        plt.colorbar()
+        plt.show()
+    except Exception as e:
+        logger.warning('matplitlib error, %s' % e)
+        cv2.imshow('result', image)
+        cv2.waitKey()
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,3 +9,4 @@ scipy @@
     slidingwindow
     tqdm
     git+https://github.com/ppwwyyxx/tensorpack.git
+    numba