Merge branch 'main' of https://github.com/IDEA-Research/OSX

IDEA-Research · Apr 15, 2023 · 77b252a · 77b252a
2 parents 96472bc + dbc0972
commit 77b252a
Show file tree

Hide file tree

Showing 6 changed files with 25 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -87,6 +87,8 @@ ${ROOT}
 |   |   |-- human_model_files
 |   |   |   |-- smpl
 |   |   |   |   |-- SMPL_NEUTRAL.pkl
+|   |   |   |   |-- SMPL_MALE.pkl
+|   |   |   |   |-- SMPL_FEMALE.pkl
 |   |   |   |-- smplx
 |   |   |   |   |-- MANO_SMPLX_vertex_ids.pkl
 |   |   |   |   |-- SMPL-X__FLAME_vertex_ids.npy
@@ -108,7 +110,7 @@ ${ROOT}
 * `tool` contains pre-processing codes of AGORA and pytorch model editing codes.
 * `output` contains log, trained models, visualized outputs, and test result.  
 * `common` contains kernel codes for Hand4Whole.  
-* `human_model_files` contains `smpl`, `smplx`, `mano`, and `flame` 3D model files. Download the files from [[smpl]](https://smpl.is.tue.mpg.de/) [[smplx]](https://smpl-x.is.tue.mpg.de/) [[SMPLX_to_J14.pkl]](https://github.com/vchoutas/expose#preparing-the-data) [[mano]](https://mano.is.tue.mpg.de/) [[flame]](https://flame.is.tue.mpg.de/).
+* `human_model_files` contains `smpl`, `smplx`, `mano`, and `flame` 3D model files. Download the files from [[smpl]](https://smpl.is.tue.mpg.de/) [[smplx]](https://smpl-x.is.tue.mpg.de/) [[SMPLX_to_J14.pkl]](https://github.com/vchoutas/expose#preparing-the-data) [[mano]](https://mano.is.tue.mpg.de/) [[flame]](https://flame.is.tue.mpg.de/). If you have problems about the model preparation, please refer to this [issue](https://github.com/IDEA-Research/OSX/issues/9), where I provide the link for each files.
 ### (2) Data  
 You need to follow directory structure of the `dataset` as below.  
 ```  
@@ -157,7 +159,7 @@ ${ROOT}
 * Download Human3.6M parsed data and SMPL-X parameters [[data](https://drive.google.com/drive/folders/1r0B9I3XxIIW_jsXjYinDpL6NFcxTZart?usp=sharing)][[SMPL-X parameters from NeuralAnnot](https://drive.google.com/drive/folders/19ifIQtAB3ll8d37-kerL1eQWp31mOwJM?usp=sharing)]
 * Download MPII parsed data and SMPL-X parameters [[data](https://drive.google.com/drive/folders/1rrL_RxhwQgwhq5BU1iIRPwl285B_KTpU?usp=sharing)][[SMPL-X parameters from NeuralAnnot](https://drive.google.com/file/d/1alkKvhkqQGqriKst83uS-kUG7v6SkM7W/view?usp=sharing)]
 * Download MPI-INF-3DHP parsed data and SMPL-X parameters [[data](https://drive.google.com/drive/folders/1wQbHEXPv-WH1sNOLwdfMVB7OWsiJkq2M?usp=sharing)][[SMPL-X parameters from NeuralAnnot](https://drive.google.com/file/d/1ADOJlaqaBDjZ3IEgrgLTQwNf6iHd-rGH/view?usp=sharing)]
-* Download MSCOCO data and SMPL-X parameters [[data](https://github.com/jin-s13/COCO-WholeBody)][[SMPL-X parameters from NeuralAnnot](https://drive.google.com/file/d/1RVJiI2Y1TjiAPcRnDZk5CX5L7Vehfinm/view?usp=sharing)]
+* Download MSCOCO data and SMPL-X parameters [[data](https://github.com/jin-s13/COCO-WholeBody)][[SMPL-X parameters](https://drive.google.com/file/d/1UVyfqrOtkbhI3MgpBYXd1YXbkD8aJtL9/view?usp=share_link)]
 * Download 3DPW parsed data [[data](https://drive.google.com/drive/folders/1HByTBsdg_A_o-d89qd55glTl44ya3dOs?usp=sharing)]
 * All annotation files follow [MSCOCO format](http://cocodataset.org/#format-data). If you want to add your own dataset, you have to convert it to [MSCOCO format](http://cocodataset.org/#format-data).  
 
@@ -197,7 +199,7 @@ python test.py --gpu 0,1,2,3 --exp_name output/train_setting1/ --pretrained_mode
 # test on AGORA-val
 python test.py --gpu 0,1,2,3 --exp_name output/train_setting1/ --pretrained_model_path ../output/train_setting1/model_dump/snapshot_13.pth --testset AGORA
 ```
-To speed up, you can use a light-weight version OSX by change the encoder setting by adding `--encoder_setting osx_b` or change the decoder settiing by adding `--decoder_setting wo_face_decoder`
+To speed up, you can use a light-weight version OSX by change the encoder setting by adding `--encoder_setting osx_b` or change the decoder setting by adding `--decoder_setting wo_face_decoder`
 #### (3) Train on AGORA and Test on AGORA-test
 
 In the `main` folder, run  
@@ -271,7 +273,7 @@ You can zip the `predictions` folder into `predictions.zip` and submit it to the
 
 * `RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.`: Go to [here](https://github.com/mks0601/I2L-MeshNet_RELEASE/issues/6#issuecomment-675152527)
 
-* `TypeError: startswith first arg must be bytes or a tuple of bytes, not str.`: Go to [here](https://github.com/mcfletch/pyopengl/issues/27)
+* `TypeError: startswith first arg must be bytes or a tuple of bytes, not str.`: Go to [here](https://github.com/mcfletch/pyopengl/issues/27). It seems that this solution only works for RTX3090. If it works for V100 or A100 in your case, please tell me in the issue :)
 
 ### Acknowledgement
 
@@ -280,10 +282,10 @@ This repo is mainly based on [Hand4Whole](https://github.com/mks0601/Hand4Whole_
 ## Reference  
 
 ```  
-@article{lin2023osx,
-  author    = {Lin, Jing and Zeng, Ailing and Wang, Haoqian and Zhang, Lei and Li, Yu},
-  title     = {One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer},
-  journal   = {CVPR},
-  year      = {2023},
+@article{lin2023one,
+  title={One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer},
+  author={Lin, Jing and Zeng, Ailing and Wang, Haoqian and Zhang, Lei and Li, Yu},
+  journal={arXiv preprint arXiv:2303.16160},
+  year={2023}
 }
 ```
diff --git a/main/OSX.py b/main/OSX.py
@@ -72,7 +72,7 @@ def get_coord(self, root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shap
 
         # project 3D coordinates to 2D space
         if mode == 'train' and len(cfg.trainset_3d) == 1 and cfg.trainset_3d[0] == 'AGORA' and len(
-                cfg.trainset_2d) == 0:  # prevent gradients from backpropagating to SMPLX paraemter regression module
+                cfg.trainset_2d) == 0:  # prevent gradients from backpropagating to SMPLX parameter regression module
             x = (joint_cam[:, :, 0].detach() + cam_trans[:, None, 0]) / (
                     joint_cam[:, :, 2].detach() + cam_trans[:, None, 2] + 1e-4) * cfg.focal[0] + cfg.princpt[0]
             y = (joint_cam[:, :, 1].detach() + cam_trans[:, None, 1]) / (
@@ -91,14 +91,14 @@ def get_coord(self, root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shap
         joint_cam = joint_cam - root_cam
         mesh_cam = mesh_cam + cam_trans[:, None, :]  # for rendering
 
-        # left hand root (left wrist)-relative 3D coordinatese
+        # left hand root (left wrist)-relative 3D coordinates
         lhand_idx = smpl_x.joint_part['lhand']
         lhand_cam = joint_cam[:, lhand_idx, :]
         lwrist_cam = joint_cam[:, smpl_x.lwrist_idx, None, :]
         lhand_cam = lhand_cam - lwrist_cam
         joint_cam = torch.cat((joint_cam[:, :lhand_idx[0], :], lhand_cam, joint_cam[:, lhand_idx[-1] + 1:, :]), 1)
 
-        # right hand root (right wrist)-relative 3D coordinatese
+        # right hand root (right wrist)-relative 3D coordinates
         rhand_idx = smpl_x.joint_part['rhand']
         rhand_cam = joint_cam[:, rhand_idx, :]
         rwrist_cam = joint_cam[:, smpl_x.rwrist_idx, None, :]

diff --git a/main/OSX_WoDecoder.py b/main/OSX_WoDecoder.py
@@ -68,7 +68,7 @@ def get_coord(self, root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shap
 
         # project 3D coordinates to 2D space
         if mode == 'train' and len(cfg.trainset_3d) == 1 and cfg.trainset_3d[0] == 'AGORA' and len(
-                cfg.trainset_2d) == 0:  # prevent gradients from backpropagating to SMPLX paraemter regression module
+                cfg.trainset_2d) == 0:  # prevent gradients from backpropagating to SMPLX parameter regression module
             x = (joint_cam[:, :, 0].detach() + cam_trans[:, None, 0]) / (
                     joint_cam[:, :, 2].detach() + cam_trans[:, None, 2] + 1e-4) * cfg.focal[0] + cfg.princpt[0]
             y = (joint_cam[:, :, 1].detach() + cam_trans[:, None, 1]) / (
@@ -87,14 +87,14 @@ def get_coord(self, root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shap
         joint_cam = joint_cam - root_cam
         mesh_cam = mesh_cam + cam_trans[:, None, :]  # for rendering
 
-        # left hand root (left wrist)-relative 3D coordinatese
+        # left hand root (left wrist)-relative 3D coordinates
         lhand_idx = smpl_x.joint_part['lhand']
         lhand_cam = joint_cam[:, lhand_idx, :]
         lwrist_cam = joint_cam[:, smpl_x.lwrist_idx, None, :]
         lhand_cam = lhand_cam - lwrist_cam
         joint_cam = torch.cat((joint_cam[:, :lhand_idx[0], :], lhand_cam, joint_cam[:, lhand_idx[-1] + 1:, :]), 1)
 
-        # right hand root (right wrist)-relative 3D coordinatese
+        # right hand root (right wrist)-relative 3D coordinates
         rhand_idx = smpl_x.joint_part['rhand']
         rhand_cam = joint_cam[:, rhand_idx, :]
         rwrist_cam = joint_cam[:, smpl_x.rwrist_idx, None, :]
@@ -392,4 +392,4 @@ def get_model(mode):
     encoder = vit.backbone
     model = Model(encoder, body_position_net, body_rotation_net, box_net, hand_position_net, hand_roi_net, hand_rotation_net,
                   face_regressor)
-    return model
+    return model
diff --git a/main/OSX_WoFaceDecoder.py b/main/OSX_WoFaceDecoder.py
@@ -69,7 +69,7 @@ def get_coord(self, root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shap
 
         # project 3D coordinates to 2D space
         if mode == 'train' and len(cfg.trainset_3d) == 1 and cfg.trainset_3d[0] == 'AGORA' and len(
-                cfg.trainset_2d) == 0:  # prevent gradients from backpropagating to SMPLX paraemter regression module
+                cfg.trainset_2d) == 0:  # prevent gradients from backpropagating to SMPLX parameter regression module
             x = (joint_cam[:, :, 0].detach() + cam_trans[:, None, 0]) / (
                     joint_cam[:, :, 2].detach() + cam_trans[:, None, 2] + 1e-4) * cfg.focal[0] + cfg.princpt[0]
             y = (joint_cam[:, :, 1].detach() + cam_trans[:, None, 1]) / (
@@ -88,14 +88,14 @@ def get_coord(self, root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shap
         joint_cam = joint_cam - root_cam
         mesh_cam = mesh_cam + cam_trans[:, None, :]  # for rendering
 
-        # left hand root (left wrist)-relative 3D coordinatese
+        # left hand root (left wrist)-relative 3D coordinates
         lhand_idx = smpl_x.joint_part['lhand']
         lhand_cam = joint_cam[:, lhand_idx, :]
         lwrist_cam = joint_cam[:, smpl_x.lwrist_idx, None, :]
         lhand_cam = lhand_cam - lwrist_cam
         joint_cam = torch.cat((joint_cam[:, :lhand_idx[0], :], lhand_cam, joint_cam[:, lhand_idx[-1] + 1:, :]), 1)
 
-        # right hand root (right wrist)-relative 3D coordinatese
+        # right hand root (right wrist)-relative 3D coordinates
         rhand_idx = smpl_x.joint_part['rhand']
         rhand_cam = joint_cam[:, rhand_idx, :]
         rwrist_cam = joint_cam[:, smpl_x.rwrist_idx, None, :]

diff --git a/main/test.py b/main/test.py
@@ -17,7 +17,7 @@ def parse_args():
     args = parser.parse_args()
 
     if not args.gpu_ids:
-        assert 0, "Please set propoer gpu ids"
+        assert 0, "Please set proper gpu ids"
 
     if '-' in args.gpu_ids:
         gpus = args.gpu_ids.split('-')
@@ -70,4 +70,4 @@ def main():
     tester._print_eval_result(eval_result)
 
 if __name__ == "__main__":
-    main()
+    main()
diff --git a/main/train.py b/main/train.py
@@ -19,7 +19,7 @@ def parse_args():
     args = parser.parse_args()
 
     if not args.gpu_ids:
-        assert 0, "Please set propoer gpu ids"
+        assert 0, "Please set proper gpu ids"
 
     if not args.lr:
         assert 0, "Please set learning rate"
@@ -97,4 +97,4 @@ def main():
             }, epoch)
 
 if __name__ == "__main__":
-    main()
+    main()