Skip to content

The goal of this project is to use recent advances in monocular depth, instance segmentation, scene graphs, and graph neural networks to push the state-of-the-art in indoor image retrieval.

Notifications You must be signed in to change notification settings

rupalsaxena/Monograph-Image-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Monograph-Image-Retrieval

The goal of this project was to use recent advances in monocular depth, instance segmentation, scene graphs, and graph neural networks to push the state-of-the-art in indoor image retrieval.

Repository Overview

- data
    - hypersim_train_graphs_GT    # example graphs for training
    - hypersim_test_graphs_GT     # example graphs for testing
- docker                          # docker build and docker run files
- src
    - Monograph
        - configs                 # general configs
        - dataloader              # dataloaders for different purposes
        - GNN                     # load graphs and train GCN model
        - generate_scene_graph    # generate scene graphs from ground truth data
        - preds_to_graphs         # generate scene graphs from predicted data
        - rgb-to-depth            # rgb to depth prediction
        - rgb-to-semantics        # rgb to semantic training and prediction

List of packages used in this project can be found in file requirements.txt

Setting up Infrastructure using Singularity in ETHZ HPC euler

We are using docker container so that all the users can have same working environment. You can either use Singularity in ETHZ HPC euler or directly use docker container in your local machine.

To use Singularity in euler, your NETHZ username has to be added to ID-HPC-SINGULARITY group.

Request a compute node with Singularity. This step will take some time.

srun --pty --mem-per-cpu=4G --gpus=1 bash

Load module eth_proxy to connect to the internet from a compute node

module load eth_proxy

Navigate to $SCRATCH

cd $SCRATCH

Pull the Docker image with Singularity.

singularity pull docker://rupalsaxena/3dvision_grp11

Check if singularity file is available

ls 

Running ls above should return: 3dvision_grp11_latest.sif

If this file is available, you are good to go. Otherwise, contact maintainer of docker container of this repo.

Run the container as follows. Note: euler_username = your euler username

singularity run --bind /cluster/home/euler_username:/cluster/home/euler_username --bind /cluster/project/infk/courses/252-0579-00L/group11_2023:/cluster/project/infk/courses/252-0579-00L/group11_2023 3dvision_grp11_latest.sif 

Keep this repo in /cluster/home/euler_username path so that your repo can be mounted automatically inside docker.

Once you run this, you are inside the singularity docker.

  1. Navigate to /cluster/home/euler_username folder to see this repo inside the docker.
  2. Navigate to /cluster/project/infk/courses/252-0579-00L/group11_2023 folder to see the data inside the docker.

If you can see both the folders mentioned above, congratulations, your infrastructure is ready!

Pipeline overview

Shown below is the pipeline overview of this project.

Alt text

We will see how to run each element of this pipeline one by one.

Hypersim Data

The Hypersim dataset, developed by Apple Inc., consists of a collection of photorealistic synthetic images depicting indoor scenes, accompanied by per-pixel ground truth labels. This data consists of RGB, Semantic, and Depth instances.

Download the data from the link: HypersimData

RGB to Semantic Instances

STEP 1: Save rgb and groundtruth semantic instances in torch format using dataloader.

# navigate to hypersim dataloader
cd src/Monograph/dataloader/hypersim_pytorch

In your IDE, update the config parameters, specially PURPOSE="Semantic" Run the following commands to save hypersim data in pytorch format:

# save hypersim data in torch format
python3 save_hypersim_dataset.py

Data will be saved in your provided output path in torch format.

STEP 2: Train DeepLabv3 Resnet50 Model using transfer learning.

# navigate to rgb-to-semantics directory
cd ../../rgb-to-semantics 

Update input path and trained model output path in config.py Train the model

python3 transfer_learning.py

Once training is over, check the provided output path to see trained model.

STEP 3: Save predicted semantic data. Update MODELPATH TESTDATAPATH, main_path in test_model.py file. Finally, run the file to get predicted semantic instances.

python3 test_model.py

Check the predicted semantic images in main_path provided in config.py file.

RGB to Depth

Predict Depth from Rgb using AdelaiDepth pretrained network.

navigate to rgb-to-depth directory:

cd ../rgb-to-depth 

Update image_dir and image_dir_out folders in test_depth.py before running it. Once updated, run the following:

python3 test_depth.py 
cd ../../../

Predicted depth data should be available on image_dir_out path mentioned above.

Generate Scene Graph

Method 1: Graphs from Ground Truth Images

To generate graphs from ground truth depth and ground truth semantic instances of hypersim data, do as follows:

Step 1: Update HYPERSIM_DATAPATH and HYPERSIM_GRAPHS path in "src/Monograph/generate_scene_graph/config.py" file.

Step 2: Run graph generation using following commands:

cd src/Monograph
python3 main_save_graphs.py

Method 2: Graphs from predicted images

To generate graphs from predicted depth and predicted semantic instances of hypersim data, do as follows:

Step 1: Update src/Monograph/preds_to_graphs/config.py file. Make sure to give correct paths of hypersim rgb, predicted semantics, predicted depth, and output path.

Step 2: Generate graphs from predicted data

cd src/Monograph/preds_to_graphs
python3 preds_to_graphs.py

Train GCN network with Triplet loss using Generated Scene Graphs

To train the model from groundtruth generated graphs, run the following commands:

# navigate to GNN directory
cd src/Monograph/GNN

# run ground truth graph GCN training with threshold 2
python3 LoadAndTrain.py 2 ../../../data/hypersim_train_graphs_GT/

Proximity Matching

To perform image retrival on example test data with different thresholds, run the following commands:

# navigate to GNN directory
cd src/Monograph/GNN

# run proximity matching
python3 PipelineFeatures.py

About

The goal of this project is to use recent advances in monocular depth, instance segmentation, scene graphs, and graph neural networks to push the state-of-the-art in indoor image retrieval.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published