Paper accepted by ECCV 2024!
Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
Opher Bar Nathan | Deborah Levy | Tali Treibitz | Dan Rosenbaum
This repository contains official PyTorch implementation for Osmosis: RGBD Diffusion Prior for Underwater Image Restoration, ECCV, 2024.
This code is based on guided-diffusion and DPS.
Underwater image restoration is a challenging task because of water effects that increase dramatically with distance. This is worsened by lack of ground truth data of clean scenes without water. Diffusion priors have emerged as strong image restoration priors. However, they are often trained with a dataset of the desired restored output, which is not available in our case. We also observe that using only color data is insufficient, and therefore augment the prior with a depth channel. We train an unconditional diffusion model prior on the joint space of color and depth, using standard RGBD datasets of natural outdoor scenes in air. Using this prior together with a novel guidance method based on the underwater image formation model, we generate posterior samples of clean images, removing the water effects. Even though our prior did not see any underwater images during training, our method outperforms state-of-the-art baselines for image restoration on very challenging scenes.
In the course of this research, an unconditional Diffusion Probabilistic Model (DDPM) is trained on RGBD (color image and depth map) data. The training follows improved-diffusion and guided-diffusion. To adapt the model for RGBD data (instead of RGB), we made specific modifications by adjusting the UNet input layer to handle 4 channels and the output layers to generate 8 channels.
The new prior is trained using 4 outdoor RGBD datasets: DIODE (only outdoor scenes), HRWSI, KITTI and ReDWeb-S.
The trained RGBD prior, named "osmosis_outdoor.pt," can be downloaded from the provided link
The method is specifically designed for underwater scenes.
Consequently, underwater images are supplied, and simulated data was also examined for quantitative analysis.
Furthermore, the algorithm exhibits versatility for additional tasks such as dehazing, hence, a set of images with haze is included.
Underwater images - real data - link
This directory contains the real world underwater images showcased in both the paper and the appendix.
This folder contains two similar datasets.
Both datasets contain identical images, with the Low-Resolution Set serving as a cropped and resized version of the High-Resolution Set images.
Our method accepts input images of any resolution, but it standardizes the resolution by resizing them to 256 pixels on the smaller side and subsequently center cropping them.
The underwater images are sourced from three datasets: SQUID, SeaThru, SeaThru-Nerf and additional scenes captured by Dr. Derya Akkaynak, Matan Yuval and Deborah Levy.
The images are linear (were not undergo any non-linear processing) and undergo a white balance process.
Underwater images - Simulated data with Ground Truth - Link
As part of this study, underwater scenes were simulated to facilitate quantitative comparisons.
Images are sourced from the indoor dataset NYUv2, each accompanied by its corresponding depth map. This dataset comprises a total of 1449 images.
Each simulation includes 3 folders:
- input - the simulated images
- gt_rgb - Ground Truth color images
- gt_depth - Ground Truth depth maps
Hazed images - link
We present preliminary results of applying this method to the dehazing task, therefore, we provide several images captured in hazed conditions.
In case you would like to try this method on your own data:
- Place all images in the same folder.
- In the configurations file, modify the field
data: root: <path>
to the folder path. - Specify the name in the
data: name: <dataset_name>
field; the results will be saved into a folder with the same name. - If there is ground truth data, indicate its path in the
data: gt_rgb: <path>
anddata: gt_depth: <path>
fields. Change the flagdata: ground_truth: True
(similar to the configurations inosmosis_simulation_sample_config.yaml
). - If your data is not simulated or is not include linear images, setting the flag
degamma_input: True
often produces improved results.
See the environment file: link
git clone https://github.com/osmosis-diffusion/osmosis-diffusion-code
cd osmosis-diffusion-code
If such a directory does not exist, create a new directory named ./models
.
From the link, download the
checkpoint "osmosis_outdoor.pt" into ./models/
directory.
If such a directory does not exist, create a new directory named ./data
.
Download the relevant dataset into ./data/
directory.
For This section there are two options:
- Setting of local environment
- Build Docker image
Install dependencies
conda create -n osmosis python=3.8
conda activate osmosis
See dependencies at environment.yml file - link
Before executing the following commands, ensure that the Docker engine, GPU driver, and appropriate CUDA are installed.
If using the Docker image, ensure that the data paths, model path, and results path are in the working directory.
Navigate to the osmosis-diffusion-code
directory (where the project was cloned), and run the following commands in the command line:
Build a Docker image
docker build -t osmosis_docker .
Run docker image on Windows:
docker run -v %cd%:/home/osmosis-diffusion-code --gpus all -it --rm osmosis_docker
Run docker image on Linux:
docker run -v $(pwd):/home/osmosis-diffusion-code --gpus all -it --rm osmosis_docker
The configuration file structure is thoroughly outlined in this section, enabling users to modify configurations and fine-tune parameters for experimental purposes.
By default, results are saved in the directory ./results/<operator name>/<dataset name>/<date>/<run#>
.
Additionally, both a log file and configuration file are stored in the same path.
To execute inference from the command line, navigate to the running directory and specify the Python file to run along with two arguments:
- The first argument is the required configuration file (
-c <path to config file>
) - The second argument is the device ID (GPU) to run the inference on, default is 0. (
-d <device id>
)
python <script_name>.py -c <path to config file> -d <device id>
There are several examples bellow.
There are 4 possible configurations:
Relevant for real underwater images.
python osmosis_sampling.py -c ./configs/osmosis_sample_config.yaml
On the left is an underwater image, serving as the input to our method. In the middle is the restored RGB image, and on the right is the depth estimation, where blue represents close distances and yellow farther distances.
Applicable to simulated underwater images, where both the Ground Truth RGB image and depth map are provided.
python osmosis_sampling.py -c ./configs/osmosis_simulation_sample_config.yaml
The first row is the same as above. On the left is a simulated underwater image, serving as the input to our method. In the middle is the restored RGB image, and on the right is the depth estimation, where blue represents close distances and yellow farther distances. In the second row, there is the ground truth RGB image and the depth map.
Relevant for images in haze environment.
python osmosis_sampling.py -c ./configs/osmosis_haze_sample_config.yaml
On the left is a simulated underwater image, serving as the input to our method. In the middle is the restored RGB image, and on the right is the depth estimation, where where blue represents close distances and yellow farther distances.
In this scenario, there is no guidance provided for the sampling process, resulting in the production of an RGB image and its corresponding depth map.
The absence of guidance implies no constraints on achieving a visually coherent image.
python RGBD_prior_sampling.py -c ./configs/RGBD_sample_config.yaml
Each pair of images (RGB image and depth map) is generated from the prior without any guidance on the sampling process. Here, black indicates close distances, and white represents farther distances.
In this section the structure and the relevant fields in the configuration file are explained.
save_dir: ./results # saving directory path
degamma_input: False # should be True in case of NOT linear images, or NOT simulated images, otherwise False
manual_seed: 0 # manual seed for the diffusion sampling process
rgb_guidance: False # relevant only for the RGB guidance and producing depth map for the input image
save_singles: True # save single results images - 1)reference image (input), 2)restored RGB image and 3)depth estimation image
save_grids: True # save grid of the results, next to each other
record_process: True # record the sampling process
record_every: 200 # in case "record_process: True" - record every <value> steps (in this case - 200)
# change unet input and output - for RGBD
change_input_output_channels: True
input_channels: 4 # RGBD
output_channels: 8 # RGBD * 2 - learning sigma = True, if False 4
sample_pattern: # the diffusion sampling pattern for the
pattern: pcgs # original, pcgs - from gibbsDDRM
# relevant only for "pattern: pcgs"
# update phi's
update_start: 0.7 # optimizing phi's (<value>*T)
update_end: 0
n_iter: 20 # for each t step, the number of optimization steps for te phi's
unet_model: # unet model configurations
model_path: ./models/osmosis_outdoor.pt # pretrained model path
pretrain_model: osmosis # pretrained model name
conditioning:
method: osmosis # conditioning method - osmosis, ps
params:
loss_weight: depth # none, depth # if "none" so the rest has no meaning
weight_function: gamma,1.4,1.4,1 # function,original- [0,1], gamma=((x+value[0])*value[1])^value[2]
scale: 7,7,7,0.9 # guidance scale for each channel (RGBD)
gradient_clip: True,0.005 # gradient clipping value (is True)
# specify the loss and its weight/scale, if not specified so no auxiliary loss
# see the paper for details on the losses
aux_loss:
aux_loss:
avrg_loss: 0.5 # scale of average loss
val_loss: 20 # scale of value loss
data:
name: osmosis # dataset name
root: .\data\underwater\high_res # path of the dataset
ground_truth: False # if the dataset includes ground truth - True, else - False
gt_rgb: .\data\simulation_1\gt_rgb # dataset ground truth paths - comment when no GT data
gt_depth: .\data\simulation_1\gt_depth # dataset ground truth paths - comment when no GT data
measurement:
operator:
name: underwater_physical_revised # underwater_physical_revised, haze_physical, noise (for check prior)
optimizer: sgd # water parameters optimizer - options are adam, sgd
depth_type: gamma # original- [0,1], gamma=((x+value[0])*value[1])^value[2]
value: 1.4,1.4,1
phi_a: 1.1,0.95,0.95 # initalized values
phi_a_eta: 1e-5 # step size for the optimization
phi_a_learn_flag: True # optimization flaf - if False, there is no optimization for this parameter
phi_b: 0.95, 0.8, 0.8 # initalized values
phi_b_eta: 1e-5 # step size for the optimization
phi_b_learn_flag: True # optimization flaf - if False, there is no optimization for this parameter
phi_inf: 0.14, 0.29, 0.49 # initalized values
phi_inf_eta: 1e-5 # step size for the optimization
phi_inf_learn_flag: True # optimization flaf - if False, there is no optimization for this parameter
noise: # added noise
name: clean # clean - osmosis, gaussian - ps
# sigma: 0 # comment in case of "clean" uncomment in case of "gaussian"
The results are saved in the directory specified by the save_dir: <path>
parameter in the configuration file.
Subdirectories are created within the specified <path>
based on measurement: operator: name: <operator_name>
,
data: name: <data_name>
, current date and run number, to prevent overwriting existing files.
Individual images are stored in the single_images
directory,
while grid results (consisting of the input image, restored RGB, and predicted depth displayed side by side)
and process are saved under grid_results
.
The path for single images will be: <path>/<operator_name>/<data_name>/<today_date>/<run_number>/single_images/
.
For example: <path>/underwater_physical_revised/simulation/21-6-24/run2/single_images/
.
In case you would like to see how the sampling process looks like, set those fields to True and specify a value in
the record_every: <value>
field.
save_grids: True
record_process: True
record_every: 150
An example: (the process starts from left to right and sampled every 150 time step)
The first row shows the predicted image on that step, and the second row shows the depth map in that step, where blue represents close distances and yellow farther distances.
If you find this project useful, please consider citing:
@article{nathan2024osmosis,
title={Osmosis: RGBD Diffusion Prior for Underwater Image Restoration},
author={Bar Nathan, Opher and Levy, Deborah and Treibitz, Tali and Rosenbaum, Dan},
journal={arXiv preprint arXiv:2403.14837},
year={2024}
}