The goal of this project was to use recent advances in monocular depth, instance segmentation, scene graphs, and graph neural networks to push the state-of-the-art in indoor image retrieval.
- data
- hypersim_train_graphs_GT # example graphs for training
- hypersim_test_graphs_GT # example graphs for testing
- docker # docker build and docker run files
- src
- Monograph
- configs # general configs
- dataloader # dataloaders for different purposes
- GNN # load graphs and train GCN model
- generate_scene_graph # generate scene graphs from ground truth data
- preds_to_graphs # generate scene graphs from predicted data
- rgb-to-depth # rgb to depth prediction
- rgb-to-semantics # rgb to semantic training and prediction
List of packages used in this project can be found in file requirements.txt
We are using docker container so that all the users can have same working environment. You can either use Singularity in ETHZ HPC euler or directly use docker container in your local machine.
To use Singularity in euler, your NETHZ username has to be added to ID-HPC-SINGULARITY group.
Request a compute node with Singularity. This step will take some time.
srun --pty --mem-per-cpu=4G --gpus=1 bash
Load module eth_proxy to connect to the internet from a compute node
module load eth_proxy
Navigate to $SCRATCH
cd $SCRATCH
Pull the Docker image with Singularity.
singularity pull docker://rupalsaxena/3dvision_grp11
Check if singularity file is available
ls
Running ls
above should return:
3dvision_grp11_latest.sif
If this file is available, you are good to go. Otherwise, contact maintainer of docker container of this repo.
Run the container as follows. Note: euler_username = your euler username
singularity run --bind /cluster/home/euler_username:/cluster/home/euler_username --bind /cluster/project/infk/courses/252-0579-00L/group11_2023:/cluster/project/infk/courses/252-0579-00L/group11_2023 3dvision_grp11_latest.sif
Keep this repo in /cluster/home/euler_username path so that your repo can be mounted automatically inside docker.
Once you run this, you are inside the singularity docker.
- Navigate to /cluster/home/euler_username folder to see this repo inside the docker.
- Navigate to /cluster/project/infk/courses/252-0579-00L/group11_2023 folder to see the data inside the docker.
If you can see both the folders mentioned above, congratulations, your infrastructure is ready!
Shown below is the pipeline overview of this project.
We will see how to run each element of this pipeline one by one.
The Hypersim dataset, developed by Apple Inc., consists of a collection of photorealistic synthetic images depicting indoor scenes, accompanied by per-pixel ground truth labels. This data consists of RGB, Semantic, and Depth instances.
Download the data from the link: HypersimData
STEP 1: Save rgb and groundtruth semantic instances in torch format using dataloader.
# navigate to hypersim dataloader
cd src/Monograph/dataloader/hypersim_pytorch
In your IDE, update the config parameters, specially PURPOSE="Semantic" Run the following commands to save hypersim data in pytorch format:
# save hypersim data in torch format
python3 save_hypersim_dataset.py
Data will be saved in your provided output path in torch format.
STEP 2: Train DeepLabv3 Resnet50 Model using transfer learning.
# navigate to rgb-to-semantics directory
cd ../../rgb-to-semantics
Update input path and trained model output path in config.py Train the model
python3 transfer_learning.py
Once training is over, check the provided output path to see trained model.
STEP 3: Save predicted semantic data. Update MODELPATH TESTDATAPATH, main_path in test_model.py file. Finally, run the file to get predicted semantic instances.
python3 test_model.py
Check the predicted semantic images in main_path provided in config.py file.
Predict Depth from Rgb using AdelaiDepth pretrained network.
navigate to rgb-to-depth directory:
cd ../rgb-to-depth
Update image_dir and image_dir_out folders in test_depth.py before running it. Once updated, run the following:
python3 test_depth.py
cd ../../../
Predicted depth data should be available on image_dir_out path mentioned above.
Method 1: Graphs from Ground Truth Images
To generate graphs from ground truth depth and ground truth semantic instances of hypersim data, do as follows:
Step 1: Update HYPERSIM_DATAPATH and HYPERSIM_GRAPHS path in "src/Monograph/generate_scene_graph/config.py" file.
Step 2: Run graph generation using following commands:
cd src/Monograph
python3 main_save_graphs.py
Method 2: Graphs from predicted images
To generate graphs from predicted depth and predicted semantic instances of hypersim data, do as follows:
Step 1: Update src/Monograph/preds_to_graphs/config.py file. Make sure to give correct paths of hypersim rgb, predicted semantics, predicted depth, and output path.
Step 2: Generate graphs from predicted data
cd src/Monograph/preds_to_graphs
python3 preds_to_graphs.py
To train the model from groundtruth generated graphs, run the following commands:
# navigate to GNN directory
cd src/Monograph/GNN
# run ground truth graph GCN training with threshold 2
python3 LoadAndTrain.py 2 ../../../data/hypersim_train_graphs_GT/
To perform image retrival on example test data with different thresholds, run the following commands:
# navigate to GNN directory
cd src/Monograph/GNN
# run proximity matching
python3 PipelineFeatures.py