rtmlib is a super lightweight library to conduct pose estimation based on RTMPose models WITHOUT any dependencies like mmcv, mmpose, mmdet, etc.
Basically, rtmlib only requires these dependencies:
- numpy
- opencv-python
- opencv-contrib-python
- onnxruntime
Optionally, you can use other common backends like opencv, onnxruntime, openvino, tensorrt to accelerate the inference process.
- For openvino users, please add the path
<your python path>\envs\<your env name>\Lib\site-packages\openvino\libs
into your environment path.
- install from pypi:
pip install rtmlib -i https://pypi.org/simple
- install from source code:
git clone https://github.com/Tau-J/rtmlib.git
cd rtmlib
pip install -r requirements.txt
pip install -e .
# [optional]
# pip install onnxruntime-gpu
# pip install openvino
Run webui.py
:
# Please make sure you have installed gradio
# pip install gradio
python webui.py
Here is also a simple demo to show how to use rtmlib to conduct pose estimation on a single image.
import cv2
from rtmlib import Wholebody, draw_skeleton
device = 'cpu' # cpu, cuda
backend = 'onnxruntime' # opencv, onnxruntime, openvino
img = cv2.imread('./demo.jpg')
openpose_skeleton = False # True for openpose-style, False for mmpose-style
wholebody = Wholebody(to_openpose=openpose_skeleton,
mode='balanced', # 'performance', 'lightweight', 'balanced'. Default: 'balanced'
backend=backend, device=device)
keypoints, scores = wholebody(img)
# visualize
# if you want to use black background instead of original image,
# img_show = np.zeros(img_show.shape, dtype=np.uint8)
img_show = draw_skeleton(img_show, keypoints, scores, kpt_thr=0.5)
cv2.imshow('img', img_show)
cv2.waitKey()
- Solutions (High-level APIs)
- Models (Low-level APIs)
- Visualization
- Support MMPose-style skeleton visualization
- Support OpenPose-style skeleton visualization
- Support WholeBody
- Support Hand
- Support Face
- Support Animal
- Support ONNXRuntime backend
- Support auto download and cache models
- Lightweight models
- Support 3 modes:
performance
,lightweight
,balanced
to select - Support alias to choose model
- Support naive PoseTracker
- Support OpenVINO backend
- Support TensorRT backend
- Gradio interface
- Compatible with Controlnet
- Support RTMO
By defaults, rtmlib will automatically download and apply models with the best performance. But you can also specify the model you want to use by passing the onnx_model
argument.
More models can be found in RTMPose Model Zoo.
Person
Notes:
- Models trained on HumanArt can detect both real human and cartoon characters.
- Models trained on COCO can only detect real human.
ONNX Model | Input Size | Description |
---|---|---|
YOLOX-l | 640x640 | trained on COCO val2017 |
YOLOX-nano | 416x416 | trained on HumanArt |
YOLOX-tiny | 416x416 | trained on HumanArt |
YOLOX-s | 640x640 | trained on HumanArt |
YOLOX-m | 640x640 | trained on HumanArt |
YOLOX-l | 640x640 | trained on HumanArt |
YOLOX-x | 640x640 | trained on HumanArt |
Body
ONNX Model | Input Size | Description |
---|---|---|
RTMPose-t | 256x192 | Body 17 Keypoints |
RTMPose-s | 256x192 | Body 17 Keypoints |
RTMPose-m | 256x192 | Body 17 Keypoints |
RTMPose-l | 384x288 | Body 17 Keypoints |
RTMPose-x | 384x288 | Body 17 Keypoints |
RTMO-s | 640x640 | Body 17 Keypoints |
RTMO-m | 640x640 | Body 17 Keypoints |
RTMO-l | 640x640 | Body 17 Keypoints |
WholeBody
ONNX Model | Input Size | Description |
---|---|---|
RTMW-m | 256x192 | Wholebody 133 Keypoints |
RTMW-l | 256x192 | Wholebody 133 Keypoints |
RTMW-l | 384x288 | Wholebody 133 Keypoints |
RTMW-x | 384x288 | Wholebody 133 Keypoints |
MMPose-style | OpenPose-style |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
@misc{rtmlib,
title={rtmlib},
author={Jiang, Tao},
year={2023},
howpublished = {\url{https://github.com/Tau-J/rtmlib}},
}
@misc{jiang2023,
doi = {10.48550/ARXIV.2303.07399},
url = {https://arxiv.org/abs/2303.07399},
author = {Jiang, Tao and Lu, Peng and Zhang, Li and Ma, Ningsheng and Han, Rui and Lyu, Chengqi and Li, Yining and Chen, Kai},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose},
publisher = {arXiv},
year = {2023},
copyright = {Creative Commons Attribution 4.0 International}
}
@misc{lu2023rtmo,
title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
year={2023},
eprint={2312.07526},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Our code is based on these repos: