Skip to content

Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns

Notifications You must be signed in to change notification settings

karazijal/probable-motion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProjectPage arXiv In NeurIPS 22

Visual Geometry Group, University of Oxford

Abstract

We propose a new approach to learn to segment multiple image objects without manual supervision. The method can extract objects form still images, but uses videos for supervision. While prior works have considered motion for segmentation, a key insight is that, while motion can be used to identify objects, not all objects are necessarily in motion: the absence of motion does not imply the absence of objects. Hence, our model learns to predict image regions that are likely to contain motion patterns characteristic of objects moving rigidly. It does not predict specific motion, which cannot be done unambiguously from a still image, but a distribution of possible motions, which includes the possibility that an object does not move at all. We demonstrate the advantage of this approach over its deterministic counterpart and show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks, surpassing methods that use motion even at test time. As our approach is applicable to variety of network architectures that segment the scenes, we also apply it to existing image reconstruction-based models showing drastic improvement.

Getting Started

This repository builds on Mask2Former.

Requirements

Create and name a conda environment of your choosing, e.g. ppmp:

conda create -n ppmp python=3.9
conda activate ppmp

then install the requirements using this one liner:

conda install -y pytorch=1.12.1 torchvisio=0.13.1 cudatoolkit=11.3 -c pytorch && \
conda install -y kornia jupyter tensorboard timm einops scikit-learn scikit-image openexr-python tqdm -c conda-forge && \
conda install -y gcc_linux-64=7 gxx_linux-64=7 fontconfig && \
yes | pip install cvbase opencv-python filelock && \
yes | python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' && \
cd mask2former/modeling/pixel_decoder/ops && \
sh make.sh

Data Preparation

Datasets should be placed under data/<dataset_name>, like data/movi_a or data/moving_clevrtex.

Moving CLEVR/ClevrTex

For MovingClevrTex, download and place the tar files under data/moving_clevrtex/tar, see instructions here. The dataloader is set up to build an index into tar files and read required information on the fly.

Movi

For MOVi datasets, the files should be extracted to data/<dataset_name>/<train or validation>/<seq name>/ using <seq name>_rgb_<frame num>.jpg for rgb, <seq name>_ano_<frame num>.png for masks, <seq name>_fwd_<frame num>.npz or <seq name>_bwd_<frame num>.npz for forward/backward optical flow, repectively. For example:

data/movi_a/train/movi_a_5995/movi_a_5995_ano_017.png
data/movi_a/train/movi_a_5995/movi_a_5995_rgb_017.jpg
data/movi_a/train/movi_a_5995/movi_a_5995_fwd_017.npz
data/movi_a/train/movi_a_5995/movi_a_5995_bwd_017.npz

See this notebook for details how to (down)load and normalise the Kubric datasets.

KITTI

For KITTI, RAFT flow is required. We followed processing from here with appropriate filepath changes for KITTI dataset structure.

Running

Experiments are controlled through a mix of config files and command line arguments. See config files and config.py for a list of all available options. For e.g. MOVi C dataset.

python main.py --config config_sacnn.yaml UNSUPVIDSEG.DATASET MOVi_C

or for MOVi D

# Note the switch to 24 object queries (slots)
python main.py --config config_sacnn.yaml UNSUPVIDSEG.DATASET MOVi_D MODEL.MASK_FORMER.NUM_OBJECT_QUERIES 24  

Checkpoints

See here for available checkpoints.

Citation

@inproceedings{karazija22unsupervised,
    author = {Karazija, Laurynas and Choudhury, Subhabrata and Laina, Iro and Rupprecht, Christian and Vedaldi, Andrea},
    booktitle = {Advances in Neural Information Processing Systems},
    title = {{U}nsupervised {M}ulti-object {S}egmentation by {P}redicting {P}robable {M}otion {P}atterns},
    volume = {35},
    year = {2022}
}

About

Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages