ROMP is a one-stage network for multi-person 3D mesh recovery from a single image.
Monocular, One-stage, Regression of Multiple 3D People,
Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei,
arXiv paper (arXiv 2008.12272)
Contact: [email protected]. Feel free to contact me for related questions or discussions!
-
Simple: Simultaneously predicting the body center locations and corresponding 3D body mesh parameters for all people at each pixel.
-
Fast: ROMP ResNet-50 model runs over 30 FPS on a 1070Ti GPU.
-
Strong: ROMP achieves superior performance on multiple challenging multi-person/occlusion benchmarks, including 3DPW, CMU Panoptic, and 3DOH50K.
-
Easy to use: We provide user friendly testing API and webcam demos.
2021/7/15: Adding support for an elegant context manager to run code in a notebook. See Colab demo for the details.
2021/4/19: Adding support for textured SMPL mesh using vedo. See visualization.md for the details.
2021/3/30: 1.0 version. Rebuilding the code. Release the ResNet-50 version and evaluation on 3DPW.
2020/11/26: Optimization for person-person occlusion. Small changes for video support.
2020/9/11: Real-time webcam demo using local/remote server. Please refer to config_guide.md for details.
2020/9/4: Google Colab demo. Saving a npy file per imag. Please refer to config_guide.md for details.
Before installation, you can take a few minutes to try the prepared Google Colab demo a try.
It allows you to run the project in the cloud, free of charge.
Please refer to the bug.md for unpleasant bugs. Welcome to submit the issues for related bugs.
Please refer to install.md for installation.
Currently, the released code is used to re-implement demo results. Only 1-2G GPU memory is needed.
To do this you just need to run
cd ROMP/src
sh run.sh
# if there are any bugs about shell script, please consider run the following command instead:
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/single_image.yml
Results will be saved in ROMP/demo/images_results.
You can also run the code on random internet images via putting the images under ROMP/demo/images.
Please refer to config_guide.md for saving the estimated mesh/Center maps/parameters dict.
You can also run the code on random internet videos.
To do this you just need to firstly change the input_video_path in src/configs/video.yml to /path/to/your/video. For example, set
video_or_frame: True
input_video_path: '../demo/videos/sample_video.mp4' # None
output_dir: '../demo/videos/sample_video_results/'
then run
cd ROMP/src
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/video.yml
Results will be saved to ../demo/videos/sample_video_results
.
You can also batch process a directory of videos. Please refer to batch_videos.md for more info.
python lib/utils/batch_videos.py --input=/home/user/Animations/mocap/cleaned --output=/home/user/Animations/mocap/cleaned/processed --extension mp4 --run_conversion --yaml_template=configs/video-batch.yml
python lib/utils/batch_videos.py --input=M:/Animations/mocap/cleaned --output=M:/Animations/mocap/cleaned/processed --extension mp4 --windows --run_conversion --yaml_template=configs/video-batch.yml
We also provide the webcam demo code, which can run at real-time on a 1070Ti GPU / remote server.
Currently, limited by the visualization pipeline, the webcam visualization code only support the single-person mesh.
To do this you just need to run:
cd ROMP/src
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/webcam.yml
# or try to use the model with ResNet-50 as backbone.
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/webcam_resnet.yml
Press Up/Down to end the demo. Pelease refer to config_guide.md for running webcam demo on remote server, setting mesh color or camera id.
Please refer to expert.md to export the results to fbx files for Blender usage. Currently, this function only support the single-person video cases. Therefore, please test it with ../demo/videos/sample_video2_results/sample_video2.mp4
, whose results would be saved to ../demo/videos/sample_video2_results
.
Please refer to evaluation.md for evaluation on benchmarks.
The code will be gradually open sourced according to:
- the schedule
- demo code for internet images / videos / webcam
- runtime optimization
- benchmark evaluation
- training
Please considering citing
@InProceedings{ROMP,
author = {Sun, Yu and Bao, Qian and Liu, Wu and Fu, Yili and Michael J., Black and Mei, Tao},
title = {Monocular, One-stage, Regression of Multiple 3D People},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021}
}
We thank Peng Cheng for his constructive comments on Center map training.
Thanks to Marco Musy for his help in the textured SMPL visualization.
Thanks to Gavin Gray for adding support for an elegant context manager to run code in a notebook via this pull.
Thanks to VLT Media for adding support for running on Windows & batch_videos.py.
Here are some great resources we benefit:
- SMPL models and layer is borrowed from MPII SMPL-X model.
- Webcam pipeline is borrowed from minimal-hand.
- Some functions are borrowed from HMR-pytorch.
- Some functions for data augmentation are borrowed from SPIN.
- Synthetic occlusion is borrowed from synthetic-occlusion.
- The evaluation code of 3DPW dataset is brought from 3dpw-eval.
- For fair comparison, the GT annotations of 3DPW dataset are brought from VIBE.
- 3D mesh visualization is supported by vedo and Open3D.