Skip to content

Commit

Permalink
Merge pull request #418 from ildoonet/dev/architecture-mobilenet-v2
Browse files Browse the repository at this point in the history
Dev/architecture mobilenet v2
  • Loading branch information
ildoonet authored Mar 14, 2019
2 parents dbc89c7 + d1f2d76 commit fa03679
Show file tree
Hide file tree
Showing 126 changed files with 532 additions and 95,044 deletions.
98 changes: 14 additions & 84 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# tf-pose-estimation

'Openpose' for human pose estimation have been implemented using Tensorflow. It also provides several variants that have made some changes to the network structure for **real-time processing on the CPU or low-power embedded devices.**
'Openpose', human pose estimation algorithm, have been implemented using Tensorflow. It also provides several variants that have some changes to the network structure for **real-time processing on the CPU or low-power embedded devices.**


**You can even run this on your macbook with descent FPS!**
**You can even run this on your macbook with a descent FPS!**

Original Repo(Caffe) : https://github.com/CMU-Perceptual-Computing-Lab/openpose

| CMU's Original Model</br> on Macbook Pro 15" | Mobilenet Variant </br>on Macbook Pro 15" | Mobilenet Variant</br>on Jetson TX2 |
| CMU's Original Model</br> on Macbook Pro 15" | Mobilenet-thin </br>on Macbook Pro 15" | Mobilenet-thin</br>on Jetson TX2 |
|:---------|:--------------------|:----------------|
| ![cmu-model](/etcs/openpose_macbook_cmu.gif) | ![mb-model-macbook](/etcs/openpose_macbook_mobilenet3.gif) | ![mb-model-tx2](/etcs/openpose_tx2_mobilenet3.gif) |
| **~0.6 FPS** | **~4.2 FPS** @ 368x368 | **~10 FPS** @ 368x368 |
Expand All @@ -17,8 +16,9 @@ Implemented features are listed here : [features](./etcs/feature.md)

## Important Updates

2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: https://github.com/ildoonet/tf-pose-estimation/tree/master/src/pafprocess
2018.2.7 Arguments in run.py script changed. Support dynamic input size.
- 2019.3.12 Add new models using mobilenet-v2 architecture. See : [experiments.md](./etc/experiments.md)
- 2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: https://github.com/ildoonet/tf-pose-estimation/tree/master/src/pafprocess
- 2018.2.7 Arguments in run.py script changed. Support dynamic input size.

## Install

Expand All @@ -29,10 +29,6 @@ You need dependencies below.
- python3
- tensorflow 1.4.1+
- opencv3, protobuf, python3-tk

### Opensources

- slim
- slidingwindow
- https://github.com/adamrehn/slidingwindow
- I copied from the above git repo to modify few things.
Expand All @@ -42,8 +38,8 @@ You need dependencies below.
Clone the repo and install 3rd-party libraries.

```bash
$ git clone https://www.github.com/ildoonet/tf-openpose
$ cd tf-openpose
$ git clone https://www.github.com/ildoonet/tf-pose-estimation
$ cd tf-pose-estimation
$ pip3 install -r requirements.txt
```

Expand All @@ -58,47 +54,23 @@ $ swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace
Alternatively, you can install this repo as a shared package using pip.

```bash
$ git clone https://www.github.com/ildoonet/tf-openpose
$ git clone https://www.github.com/ildoonet/tf-pose-estimation
$ cd tf-openpose
$ python setup.py install
```

#### Test installed package
![package_install_result](./etcs/imgcat0.gif)
```bash
python -c 'import tf_pose; tf_pose.infer(image="./images/p1.jpg")'
```

## Models & Performances

## Models

I have tried multiple variations of models to find optmized network architecture. Some of them are below and checkpoint files are provided for research purpose.

- cmu
- the model based VGG pretrained network which described in the original paper.
- I converted Weights in Caffe format to use in tensorflow.
- [pretrained weight download](https://www.dropbox.com/s/xh5s7sb7remu8tx/openpose_coco.npy?dl=0)

- dsconv
- Same architecture as the cmu version except for the **depthwise separable convolution** of mobilenet.
- I trained it using 'transfer learning', but it provides not-enough speed and accuracy.

- mobilenet
- Based on the mobilenet paper, 12 convolutional layers are used as feature-extraction layers.
- To improve on small person, **minor modification** on the architecture have been made.
- Three models were learned according to network size parameters.
- mobilenet
- 368x368 : [checkpoint weight download](https://www.dropbox.com/s/09xivpuboecge56/mobilenet_0.75_0.50_model-388003.zip?dl=0)
- mobilenet_fast
- mobilenet_accurate
- I published models which is not the best ones, but you can test them before you trained a model from the scratch.
See [experiments.md](./etc/experiments.md)

### Download Tensorflow Graph File(pb file)

Before running demo, you should download graph files. You can deploy this graph on your mobile or other platforms.

- cmu (trained in 656x368)
- mobilenet_thin (trained in 432x368)
- mobilenet_v2_large (trained in 432x368)
- mobilenet_v2_small (trained in 432x368)

CMU's model graphs are too large for git, so I uploaded them on an external cloud. You should download them if you want to use cmu's original model. Download scripts are provided in the model folder.

Expand All @@ -107,16 +79,6 @@ $ cd models/graph/cmu
$ bash download.sh
```

### Inference Time

| Dataset | Model | Inference Time<br/>Macbook Pro i5 3.1G | Inference Time<br/>Jetson TX2 |
|---------|--------------------|----------------:|----------------:|
| Coco | cmu | 10.0s @ 368x368 | OOM @ 368x368<br/> 5.5s @ 320x240|
| Coco | dsconv | 1.10s @ 368x368 |
| Coco | mobilenet_accurate | 0.40s @ 368x368 | 0.18s @ 368x368 |
| Coco | mobilenet | 0.24s @ 368x368 | 0.10s @ 368x368 |
| Coco | mobilenet_fast | 0.16s @ 368x368 | 0.07s @ 368x368 |

## Demo

### Test Inference
Expand Down Expand Up @@ -166,36 +128,4 @@ See : [etcs/training.md](./etcs/training.md)

## References

### OpenPose

[1] https://github.com/CMU-Perceptual-Computing-Lab/openpose

[2] Training Codes : https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation

[3] Custom Caffe by Openpose : https://github.com/CMU-Perceptual-Computing-Lab/caffe_train

[4] Keras Openpose : https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation

[5] Keras Openpose2 : https://github.com/kevinlin311tw/keras-openpose-reproduce

### Lifting from the deep

[1] Arxiv Paper : https://arxiv.org/abs/1701.00295

[2] https://github.com/DenisTome/Lifting-from-the-Deep-release

### Mobilenet

[1] Original Paper : https://arxiv.org/abs/1704.04861

[2] Pretrained model : https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md

### Libraries

[1] Tensorpack : https://github.com/ppwwyyxx/tensorpack

### Tensorflow Tips

[1] Freeze graph : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py

[2] Optimize graph : https://codelabs.developers.google.com/codelabs/tensorflow-for-poets-2
See : [etcs/reference.md](./etcs/reference.md)
61 changes: 54 additions & 7 deletions etcs/experiments.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,59 @@
# Trained Models & Performances

## Models

## COCO Datasets
I have tried multiple variations of models to find optmized network architecture. Some of them are below and checkpoint files are provided for research purpose.

| Set | Model | Scale | Resolution | AP | AP 50 | AP 75 | AP medium | AP large | AR | AR 50 | AR 75 | AR medium | AR large |
|-------------|----------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|
| 2014 Val | Original Paper | 3 | Image | 0.584 | 0.815 | 0.626 | 0.544 | 0.651 | | | | | |
| | | | | | | | | | | | | |
| 2014 Val | CMU | 1 | Image | 0.5067 | 0.7660 | 0.5377 | 0.4927 | 0.5309 | 0.5614 | 0.7900 | 0.5903 | 0.5089 | 0.6347 |
| 2014 Val | Mobilenet thin | 1 | Image | 0.2806 | 0.5577 | 0.2474 | 0.2802 | 0.2843 | 0.3214 | 0.5840 | 0.2997 | 0.2946 | 0.3587 |
- cmu
- the model based VGG pretrained network which described in the original paper.
- I converted Weights in Caffe format to use in tensorflow.
- [pretrained weight download](https://www.dropbox.com/s/xh5s7sb7remu8tx/openpose_coco.npy?dl=0)

- dsconv
- Same architecture as the cmu version except for the **depthwise separable convolution** of mobilenet.
- I trained it using 'transfer learning', but it provides not-enough speed and accuracy.

- mobilenet
- Based on the mobilenet paper, 12 convolutional layers are used as feature-extraction layers.
- To improve on small person, **minor modification** on the architecture have been made.
- Three models were learned according to network size parameters.
- mobilenet
- 368x368 : [checkpoint weight download](https://www.dropbox.com/s/09xivpuboecge56/mobilenet_0.75_0.50_model-388003.zip?dl=0)
- mobilenet_fast
- mobilenet_accurate
- I published models which is not the best ones, but you can test them before you trained a model from the scratch.

- mobilenet v2
- Similar to mobilenet, but using improved version of it.

| Name | Feature Layers | Configuration |
|----------------------|---------------------|---------------------------------|
| cmu | VGG16 | OpenPose |
| mobilenet_thin | Mobilenet | width=0.75 refine-width=0.75 |
| mobilenet_v2_large | Mobilenet v2 (582M) | width=1.40 refine-width=1.00 |
| mobilenet_v2_small | Mobilenet v2 (97M) | width=0.50 refine-width=0.50 |

## Performance on COCO Datasets

| Set | Model | Scale | Resolution | AP | AP 50 | AP 75 | AP medium | AP large | AR | AR 50 | AR 75 | AR medium | AR large |
|-------------|---------------------|-------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|
| 2014 Val | Original Paper | 3 | Image | 0.584 | 0.815 | 0.626 | 0.544 | 0.651 | | | | | |
| | | | | | | | | | | | | |
| 2014 Val | CMU(openpose) | 1 | Image | 0.5067 | 0.7660 | 0.5377 | 0.4927 | 0.5309 | 0.5614 | 0.7900 | 0.5903 | 0.5089 | 0.6347 |
| 2014 Val | VGG(openpose, our) | 1 | Image | 0.5067 | 0.7660 | 0.5377 | 0.4927 | 0.5309 | 0.5614 | 0.7900 | 0.5903 | 0.5089 | 0.6347 |
| | | | | | | | | | | | | |
| 2014 Val | Mobilenet thin | 1 | Image | 0.2806 | 0.5577 | 0.2474 | 0.2802 | 0.2843 | 0.3214 | 0.5840 | 0.2997 | 0.2946 | 0.3587 |
| 2014 Val | Mobilenet-v2 Large | 1 | Image | 0.3130 | 0.5846 | 0.2940 | 0.2622 | 0.3850 | 0.3680 | 0.6101 | 0.3637 | 0.2765 | 0.4912 |
| 2014 Val | Mobilenet-v2 Small | 1 | Image | 0.1730 | 0.4062 | 0.1240 | 0.1501 | 0.2105 | 0.2207 | 0.4505 | 0.1876 | 0.1601 | 0.3020 |
I also ran keras & caffe models to verify single-scale version's performance, they matched this result.

## Computation Budget & Latency

| Model | mAP@COCO2014 | GFLOPs | Latency(432x368)<br/>(Macbook 15' 2.9GHz i9, tf 1.12) | Latency(432x368)<br/>(V100 GPU) |
|---------------------|-------------:|--------|------------------------------------------------------:|-------------------------------:|
| CMU, VGG(OpenPose) | | | 0.8589s | 0.0570s |
| Mobilenet thin | 0.2806 | | 0.1701s | 0.0217s |
| Mobilenet-v2 Large | 0.3130 | | 0.2066s | 0.0214s |
| Mobilenet-v2 Small | 0.1730 | | 0.1290s | 0.0210s |

Optimized Tensorflow was built before run this experiment. This may varies between environments, images and other factors.
35 changes: 35 additions & 0 deletions etcs/reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
## Reference

### OpenPose

[1] https://github.com/CMU-Perceptual-Computing-Lab/openpose

[2] Training Codes : https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation

[3] Custom Caffe by Openpose : https://github.com/CMU-Perceptual-Computing-Lab/caffe_train

[4] Keras Openpose : https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation

[5] Keras Openpose2 : https://github.com/kevinlin311tw/keras-openpose-reproduce

### Mobilenet

[1] Original Paper : https://arxiv.org/abs/1704.04861

[2] Pretrained model : https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md

[3] Mobilenet v2 Paper : https://arxiv.org/abs/1801.04381

[4] Pretrained Model(v2) : https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet

### Libraries

[1] Tensorpack : https://github.com/ppwwyyxx/tensorpack

### Tensorflow Tips

[1] Freeze graph : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py

[2] Optimize graph : https://codelabs.developers.google.com/codelabs/tensorflow-for-poets-2

[3] Calculate FLOPs : https://stackoverflow.com/questions/45085938/tensorflow-is-there-a-way-to-measure-flops-for-a-model
10 changes: 5 additions & 5 deletions etcs/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,16 @@ And the optimization can be performed on the frozen model via graph transform pr
```bash
$ bazel build tensorflow/tools/graph_transforms:transform_graph
$ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=... \
--out_graph=... \
--in_graph=./tmp/graph_frozen.pb \
--out_graph=./tmp/graph_opt.pb \
--inputs='image:0' \
--outputs='Openpose/concat_stage7:0' \
--transforms='
strip_unused_nodes(type=float, shape="1,368,368,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignoreError=False)
fold_old_batch_norms
fold_batch_norms'
fold_batch_norms
fold_constants(ignoreError=False)
remove_nodes(op=Identity, op=CheckNumerics)'
```

Also, It is promising to quantize neural network in 8 bit to get futher improvement for speed. In my case, this will make inference less accurate and take more time on Intel's CPUs.
Expand Down
Binary file added models/graph/mobilenet_v2_large/graph_opt.pb
Binary file not shown.
Binary file added models/graph/mobilenet_v2_small/graph_opt.pb
Binary file not shown.
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ scipy
slidingwindow
tqdm
git+https://github.com/ppwwyyxx/tensorpack.git
numba
82 changes: 45 additions & 37 deletions run.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh

logger = logging.getLogger('TfPoseEstimator')
logger = logging.getLogger('TfPoseEstimatorRun')
logger.handlers.clear()
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
Expand All @@ -21,10 +22,11 @@
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='tf-pose-estimation run')
parser.add_argument('--image', type=str, default='./images/p1.jpg')
parser.add_argument('--model', type=str, default='cmu', help='cmu / mobilenet_thin')

parser.add_argument('--model', type=str, default='cmu',
help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small')
parser.add_argument('--resize', type=str, default='0x0',
help='if provided, resize images before they are processed. default=0x0, Recommends : 432x368 or 656x368 or 1312x736 ')
help='if provided, resize images before they are processed. '
'default=0x0, Recommends : 432x368 or 656x368 or 1312x736 ')
parser.add_argument('--resize-out-ratio', type=float, default=4.0,
help='if provided, resize heatmaps before they are post-processed. default=1.0')

Expand All @@ -41,6 +43,7 @@
if image is None:
logger.error('Image can not be read, path=%s' % args.image)
sys.exit(-1)

t = time.time()
humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
elapsed = time.time() - t
Expand All @@ -49,36 +52,41 @@

image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)

import matplotlib.pyplot as plt

fig = plt.figure()
a = fig.add_subplot(2, 2, 1)
a.set_title('Result')
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

bgimg = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)
bgimg = cv2.resize(bgimg, (e.heatMat.shape[1], e.heatMat.shape[0]), interpolation=cv2.INTER_AREA)

# show network output
a = fig.add_subplot(2, 2, 2)
plt.imshow(bgimg, alpha=0.5)
tmp = np.amax(e.heatMat[:, :, :-1], axis=2)
plt.imshow(tmp, cmap=plt.cm.gray, alpha=0.5)
plt.colorbar()

tmp2 = e.pafMat.transpose((2, 0, 1))
tmp2_odd = np.amax(np.absolute(tmp2[::2, :, :]), axis=0)
tmp2_even = np.amax(np.absolute(tmp2[1::2, :, :]), axis=0)

a = fig.add_subplot(2, 2, 3)
a.set_title('Vectormap-x')
# plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
plt.imshow(tmp2_odd, cmap=plt.cm.gray, alpha=0.5)
plt.colorbar()

a = fig.add_subplot(2, 2, 4)
a.set_title('Vectormap-y')
# plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
plt.imshow(tmp2_even, cmap=plt.cm.gray, alpha=0.5)
plt.colorbar()
plt.show()
try:
import matplotlib.pyplot as plt

fig = plt.figure()
a = fig.add_subplot(2, 2, 1)
a.set_title('Result')
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

bgimg = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)
bgimg = cv2.resize(bgimg, (e.heatMat.shape[1], e.heatMat.shape[0]), interpolation=cv2.INTER_AREA)

# show network output
a = fig.add_subplot(2, 2, 2)
plt.imshow(bgimg, alpha=0.5)
tmp = np.amax(e.heatMat[:, :, :-1], axis=2)
plt.imshow(tmp, cmap=plt.cm.gray, alpha=0.5)
plt.colorbar()

tmp2 = e.pafMat.transpose((2, 0, 1))
tmp2_odd = np.amax(np.absolute(tmp2[::2, :, :]), axis=0)
tmp2_even = np.amax(np.absolute(tmp2[1::2, :, :]), axis=0)

a = fig.add_subplot(2, 2, 3)
a.set_title('Vectormap-x')
# plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
plt.imshow(tmp2_odd, cmap=plt.cm.gray, alpha=0.5)
plt.colorbar()

a = fig.add_subplot(2, 2, 4)
a.set_title('Vectormap-y')
# plt.imshow(CocoPose.get_bgimg(inp, target_size=(vectmap.shape[1], vectmap.shape[0])), alpha=0.5)
plt.imshow(tmp2_even, cmap=plt.cm.gray, alpha=0.5)
plt.colorbar()
plt.show()
except Exception as e:
logger.warning('matplitlib error, %s' % e)
cv2.imshow('result', image)
cv2.waitKey()
Loading

0 comments on commit fa03679

Please sign in to comment.