Skip to content

[ICLR'24] GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion

License

Notifications You must be signed in to change notification settings

Meowuu7/GeneOH-Diffusion

Repository files navigation

Generalizable Hand-Object Interaction (HOI) Denoising

The PyTorch implementation of the paper GeneOH Diffusion, presenting a generalizable HOI denoising model designed to curate high-quality interaction data.

teaser_github_trimed.mp4

The repository contains

  • Pre-trained models and example usage (on three datasets);
  • Evaluation processes for two of our test datasets.

We will add the data and the evaluation process for the remaining test datasets, as well as the training procedure. These updates are expected to be completed before May 2024.

Getting started

This code was tested on Ubuntu 20.04.5 LTS and requires:

  • Python 3.8.13
  • conda3 or miniconda3
  • CUDA capable GPU (one is enough)

1. Setup environment

Create a virtual environment

conda create -n geneoh-diffusion python==3.8.13
conda activate geneoh-diffusion

Install torch2.2.0+cu121

pip3 install torch torchvision torchaudio

Install torch_cluster

cd whls
pip install torch_cluster-1.6.3+pt22cu121-cp38-cp38-linux_x86_64.whl
cd ..

Install remaining dependencies

pip install -r requirements.txt --no-cache

Important: Install manopth

cd manopth
pip install -e .
cd ..

Please note that the MANO layer utilized in our project deviates slightly from the original official release. It is essential to install the manopth package from this project, as failure to do so may result in abnormal denoised outcomes from the model.

2. Download pre-trained models

Download models from this link and place them in the ./ckpts folder.

3. Get data

1. GRAB

2. TACO

Besides the test datasets mentioned in the paper, we've also evaluated our model on a recent TACO dataset. Data samples for testing purposes have been included in the folder ./data/taco/source_data. More data will be incorporated soon.

Usage

GRAB

Example

Here's an example of cleaning an input trajectory (sequence 14 of GRAB's test split) with Gaussian noise.

The input noisy trajectory is constructed by adding Gaussian noise onto the trajectory data/grab/source_data/14.npy. And two different denoised samples are shown as below.

Input Result 1 Result 2

To reproduce the above result, follow the steps below:

  1. Denoising
    bash scripts/val_examples/predict_grab_rndseed_14.sh
    #### After completing the above command ####
    bash scripts/val_examples/predict_grab_rndseed_spatial_14.sh
    Ten random seeds will be utilizd for prediction. The predicted results will be saved in the folder ./data/grab/result.
  2. Mesh reconstruction
    bash scripts/val_examples/reconstruct_grab_14.sh
    Results will be saved under the same folder with the above step.
  3. Extracting results and visualization
    python visualize/vis_grab_example_14.py
    Adjust camera pose in the viewer given the first frame. Then figures capturing all frames will be saved under the root folder of the project. Use your favorate tool to compose them together into a video.

Evaluate on the test split

  1. Update data and experimental paths in .sh scripts
       ################# [Edit here] Set to your paths #################
       #### Data and exp folders ####
       export seq_root="data/grab/GRAB_processed/test"
       export grab_path="data/grab/GRAB_extracted"
       export save_dir="exp/grab/eval_save"
       export grab_processed_dir="data/grab/GRAB_processed"
  2. Denoising
    bash scripts/val/predict_grab_rndseed.sh
    #### After completing the above command ####
    bash scripts/val/predict_grab_rndseed_spatial.sh
  3. Mesh reconstruction To utilize the script scripts/val/reconstruct_grab.sh to reconstruct a single sequence, you need to set the single_seq_path and the test_tag in the script before running it.
    bash scripts/val/reconstruct_grab.sh

Denoising a full sequence

The evaluation setting for GRAB denoises the first 60 frames of a sequence. To denoise a full sequence, the input can be divided into several overlapping clips, each containing 60 frames. These clips can then be cleaned independently, followed by reconstructing the mesh sequence together.

For example, taking data/grab/source_data/14.npy, the following scripts will add artificial Gaussian noise to it and denoise the full sequence:

##### Denoising #####
bash scripts/val/predict_grab_fullseq_rndseed.sh
##### Denoising #####
bash scripts/val/predict_grab_fullseq_rndseed_spatial.sh
##### Reconstructing #####
bash scripts/val/reconstruct_grab_fullseq.sh

The single_seq_path parameter in each script specifies the sequence to denoise.

GRAB (Beta)

Example

The input noisy trajectory is constructed by adding noise from a Beta distirbution onto the trajectory data/grab/source_data/14.npy. And two different denoised samples are shown as below.

Input Result 1 Result 2

To reproduce this result, use the scripts located in the scripts/val_examples directory. Please notice that the pert_type argument in each .sh file should be set to beta.

Evaluate on the test split

To run th evaluation process on all GRAB test sequences, follow the same steps as outlined in the previous section. Please notice that the pert_type argument in each .sh file should be set to beta.

Denoising a full sequence

Follow the same steps as outlined in the previous section. Don't forget to set the pert_type argument in each .sh file should be set to beta.

TACO

Here's an example of cleaning an input noisy trajectory data/taco/source_data/20231104_017.pkl.

Below are the input, result, and overlayed video.

Input Result Overlayed

To reproduce the above result, follow the steps below:

  1. Denoising
    bash scripts/val_examples/predict_taco_rndseed_spatial_20231104_017.sh
    Ten random seeds will be utilized for prediction, and the predicted results will be saved in the folder ./data/taco/result.
  2. Mesh reconstruction
    bash scripts/val_examples/reconstruct_taco_20231104_017.sh
    Results will be saved in the same folder as mentioned in the previous step.
  3. Extracting results and visualization
    python visualize/vis_taco_example_20231104_017.py
    Adjust the camera pose in the viewer based on the first frame. Figures of all frames will be captured and saved in the root folder of the project. Finally, use your preferred tool to compile these figures into a video.

TODOs

  • Example usage, evaluation process and pre-trained models
  • Evaluation process on HOI4D, ARCTIC
  • Data: HOI4D, ARCTIC, and more examples on TACO
  • Training procedure

Bibtex

If you find this code useful in your research, please cite:

@inproceedings{liu2024geneoh,
   title={GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion},
   author={Liu, Xueyi and Yi, Li},
   booktitle={The Twelfth International Conference on Learning Representations},
   year={2024}
}

Acknowledgments

This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on: motion-diffusion-model and guided-diffusion.

License

This code is distributed under an MIT LICENSE.

About

[ICLR'24] GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published