Skip to content

Visual Representation Learning with Stochastic Frame Prediction (ICML 2024)

License

Notifications You must be signed in to change notification settings

huiwon-jang/RSP

Repository files navigation

Visual Representation Learning with Stochastic Frame Prediction

Huiwon Jang1·   Dongyoung Kim1·Junsu Kim1
Jinwoo Shin1·Pieter Abbeel2·Younggyo Seo1,3
1 KAIST   2UC Berkeley   3Dyson Robot Learning Lab  

1. Environment setup

  • We note that torch version >2.0 may work, but conda install with below version is recommended.
conda create -n rsp python=3.9.12 -y
conda activate rsp
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

2. Dataset

Dataset download

sh data_preprocessing/download.sh
sh data_preprocessing/extract.sh
  • We assume the root directory for the data: $DATA_ROOT = /data/kinetics400.
  • If you want to change the root directory, please change root_dl of download.sh and extract.sh.

Dataset pre-processing

  • We resize the data into 256x256 for the efficient loading while training.
python data_preprocessing/make_256scale.py --datadir $DATA_ROOT
  • We additionally provide the code to filter out several not-working videos.
python data_preprocessing/make_labels.py --datadir $DATA_ROOT --filedir train2

Kinetics-400

/data/kinetics400
|-- train2
    |-- abseiling
        |-- xx.mp4
        |-- ...
    |-- air_drumming
        |-- xx.mp4
        |-- ...
    |-- ...
|-- labels
    |-- label_full_1.0.pickle

3. Pre-training RSP on Kinetics-400

  • Note that [N_NODE] x [BATCH_SIZE_PER_GPU] x [ACCUM_ITER] = 1536 to reproduce our results.
  • Default: [DATA_PATH]=/data/kinetics400
python -m torch.distributed.launch --nproc_per_node=[N_NODE] main_pretrain_rsp.py \
    --batch_size [BATCH_SIZE_PER_GPU] \
    --accum_iter [ACCUM_ITER] \
    --model rsp_vit_small_patch16 \
    --epochs 400 \
    --warmup_epochs 40 \
    --data_path [DATA_PATH] \
    --log_dir [LOG_DIR] \
    --output_dir [LOG_DIR] \
    --norm_pix_loss \
    --repeated_sampling 2

4. Evaluation

We provide the checkpoint in the below:

  • ViT-S/16 400 epochs: [link]
  • ViT-B/16 400 epochs: [link]

4.1. Video Label Propagation

The evaluation code is mainly built upon Dino.

1. DAVIS 2017 video object segmentation

  • Step 1: Dataset preparation

We note that the default root path is [DATA_ROOT]=/data. Additionally, we resize DAVIS of 480x(?) to 480x880 for a natural evaluation with patches.

sh data_preprocessing/eval/davis_download.sh
python data_preprocessing/eval/davis_preprocessing.py --data_root [DATA_ROOT]
[DATA_ROOT]/DAVIS_480_880
|-- Annotations/480p
    |-- bear
        |-- 00000.png
        |-- ...
    |-- ...
|-- ImageSets/2017/val.txt
|-- JPEGImages/480p
    |-- bear
        |-- 00000.jpg
        |-- ...
    |-- ...
  • Step 2: Video object segmentation
python eval_video_segmentation_davis.py \
    --finetune [LOG_DIR]/checkpoint-199.pth \
    --output_dir [LOG_DIR]/davis_seg \
    --data_path [DATA_ROOT]/DAVIS_480_880 \
    --topk 7 --size_mask_neighborhood 30 --n_last_frames 30 \
    --model vit_small
  • Step 3: Evaluation the obtained segmentation
git clone https://github.com/davisvideochallenge/davis2017-evaluation
python ./davis2017-evaluation/evaluation_method.py \
    --task semi-supervised \
    --results_path [LOG_DIR]/davis_seg \
    --davis_path [DATA_ROOT]/DAVIS_480_880

4.2. Vision-based Robot Learning

1. CortexBench

We provide the evaluation code at https://github.com/huiwon-jang/RSP/tree/eval_cortexbench.

TOODs

  • [ ] Evaluation codes: JHMDB, VIP, RLBench, Franka Kitchen

Note

It's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during the preparation and cleaning of the code for release. If you encounter any difficulties in reproducing our findings, please don't hesitate to inform us. Additionally, we'll make an effort to carry out sanity-check experiments in the near future.

About

Visual Representation Learning with Stochastic Frame Prediction (ICML 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published