- [2024-02-29] HuggingFace demo is online!
- [2023-10-23] Support visualization through SMPL-X mesh overlay and add inference docker.
- [2023-10-02] arXiv preprint is online!
- [2023-09-28] Homepage and Video are online!
- [2023-07-19] Pretrained models are released.
- [2023-06-15] Training and testing code is released.
conda create -n smplerx python=3.8 -y
conda activate smplerx
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/index.html
pip install -r requirements.txt
# install mmpose
cd main/transformer_utils
pip install -v -e .
cd ../..
docker pull wcwcw/smplerx_inference:v0.2
docker run --gpus all -v <vid_input_folder>:/smplerx_inference/vid_input \
-v <vid_output_folder>:/smplerx_inference/vid_output \
wcwcw/smplerx_inference:v0.2 --vid <video_name>.mp4
# Currently any customization need to be applied to /smplerx_inference/smplerx/inference_docker.py
- We recently developed a docker for inference at docker hub.
- This docker image uses SMPLer-X-H32 as inference baseline and was tested at RTX3090 & WSL2 (Ubuntu 20.04).
Model | Backbone | #Datasets | #Inst. | #Params | MPE | Download | FPS |
---|---|---|---|---|---|---|---|
SMPLer-X-S32 | ViT-S | 32 | 4.5M | 32M | 82.6 | model | 36.17 |
SMPLer-X-B32 | ViT-B | 32 | 4.5M | 103M | 74.3 | model | 33.09 |
SMPLer-X-L32 | ViT-L | 32 | 4.5M | 327M | 66.2 | model | 24.44 |
SMPLer-X-H32 | ViT-H | 32 | 4.5M | 662M | 63.0 | model | 17.47 |
- MPE (Mean Primary Error): the average of the primary errors on five benchmarks (AGORA, EgoBody, UBody, 3DPW, and EHF)
- FPS (Frames Per Second): the average inference speed on a single Tesla V100 GPU, batch size = 1
- download all datasets
- process all datasets into HumanData format, except the following:
- AGORA, MSCOCO, MPII, Human3.6M, UBody.
- follow OSX in preparing these 5 datasets.
- follow OSX in preparing pretrained ViTPose models. Download the ViTPose pretrained weights for ViT-small and ViT-huge from here.
- download SMPL-X and SMPL body models.
- download mmdet pretrained model and config for inference.
The file structure should be like:
SMPLer-X/
├── common/
│ └── utils/
│ └── human_model_files/ # body model
│ ├── smpl/
│ │ ├──SMPL_NEUTRAL.pkl
│ │ ├──SMPL_MALE.pkl
│ │ └──SMPL_FEMALE.pkl
│ └── smplx/
│ ├──MANO_SMPLX_vertex_ids.pkl
│ ├──SMPL-X__FLAME_vertex_ids.npy
│ ├──SMPLX_NEUTRAL.pkl
│ ├──SMPLX_to_J14.pkl
│ ├──SMPLX_NEUTRAL.npz
│ ├──SMPLX_MALE.npz
│ └──SMPLX_FEMALE.npz
├── data/
├── main/
├── demo/
│ ├── videos/
│ ├── images/
│ └── results/
├── pretrained_models/ # pretrained ViT-Pose, SMPLer_X and mmdet models
│ ├── mmdet/
│ │ ├──faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
│ │ └──mmdet_faster_rcnn_r50_fpn_coco.py
│ ├── smpler_x_s32.pth.tar
│ ├── smpler_x_b32.pth.tar
│ ├── smpler_x_l32.pth.tar
│ ├── smpler_x_h32.pth.tar
│ ├── vitpose_small.pth
│ ├── vitpose_base.pth
│ ├── vitpose_large.pth
│ └── vitpose_huge.pth
└── dataset/
├── AGORA/
├── ARCTIC/
├── BEDLAM/
├── Behave/
├── CHI3D/
├── CrowdPose/
├── EgoBody/
├── EHF/
├── FIT3D/
├── GTA_Human2/
├── Human36M/
├── HumanSC3D/
├── InstaVariety/
├── LSPET/
├── MPII/
├── MPI_INF_3DHP/
├── MSCOCO/
├── MTP/
├── MuCo/
├── OCHuman/
├── PoseTrack/
├── PROX/
├── PW3D/
├── RenBody/
├── RICH/
├── SPEC/
├── SSP3D/
├── SynBody/
├── Talkshow/
├── UBody/
├── UP3D/
└── preprocessed_datasets/ # HumanData files
- Place the video for inference under
SMPLer-X/demo/videos
- Prepare the pretrained models to be used for inference under
SMPLer-X/pretrained_models
- Prepare the mmdet pretrained model and config under
SMPLer-X/pretrained_models
- Inference output will be saved in
SMPLer-X/demo/results
cd main
sh slurm_inference.sh {VIDEO_FILE} {FORMAT} {FPS} {PRETRAINED_CKPT}
# For inferencing test_video.mp4 (24FPS) with smpler_x_h32
sh slurm_inference.sh test_video mp4 24 smpler_x_h32
We provide a lightweight visualization script for mesh overlay based on pyrender.
- Use ffmpeg to split video into images
- The visualization script takes inference results (see above) as the input.
ffmpeg -i {VIDEO_FILE} -f image2 -vf fps=30 \
{SMPLERX INFERENCE DIR}/{VIDEO NAME (no extension)}/orig_img/%06d.jpg \
-hide_banner -loglevel error
cd main && python render.py \
--data_path {SMPLERX INFERENCE DIR} --seq {VIDEO NAME} \
--image_path {SMPLERX INFERENCE DIR}/{VIDEO NAME} \
--render_biggest_person False
cd main
sh slurm_train.sh {JOB_NAME} {NUM_GPU} {CONFIG_FILE}
# For training SMPLer-X-H32 with 16 GPUS
sh slurm_train.sh smpler_x_h32 16 config_smpler_x_h32.py
- CONFIG_FILE is the file name under
SMPLer-X/main/config
- Logs and checkpoints will be saved to
SMPLer-X/output/train_{JOB_NAME}_{DATE_TIME}
# To eval the model ../output/{TRAIN_OUTPUT_DIR}/model_dump/snapshot_{CKPT_ID}.pth.tar
# with confing ../output/{TRAIN_OUTPUT_DIR}/code/config_base.py
cd main
sh slurm_test.sh {JOB_NAME} {NUM_GPU} {TRAIN_OUTPUT_DIR} {CKPT_ID}
- NUM_GPU = 1 is recommended for testing
- Logs and results will be saved to
SMPLer-X/output/test_{JOB_NAME}_ep{CKPT_ID}_{TEST_DATSET}
-
RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.
Follow this post and modify
torchgeometry
-
KeyError: 'SinePositionalEncoding is already registered in position encoding'
or any other similar KeyErrors due to duplicate module registration.Manually add
force=True
to respective module registration undermain/transformer_utils/mmpose/models/utils
, e.g.@POSITIONAL_ENCODING.register_module(force=True)
in this file -
How do I animate my virtual characters with SMPLer-X output (like that in the demo video)?
- We are working on that, please stay tuned! Currently, this repo supports SMPL-X estimation and a simple visualization (overlay of SMPL-X vertices).