Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
initial commit
  • Loading branch information
JuliaWolleb authored Dec 28, 2021
1 parent eca2941 commit 4387068
Show file tree
Hide file tree
Showing 54 changed files with 4,592 additions and 2 deletions.
80 changes: 78 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,78 @@
# Diffusion-based-Segmentation
Here, we will provide the official Pytorch implementation of the paper "Diffusion Models for Implicit Image Segmentation Ensembles".
# Diffusion Models for Implicit Image Segmentation Ensembles

We provide the official Pytorch implementation of the paper [Diffusion Models for Implicit Image Segmentation Ensembles](https://arxiv.org/abs/2112.03145) by Julia Wolleb, Robin Sandkühler, Florentin Bieder, Philippe Valmaggia, and Philippe C. Cattin.

The implementation of Denoising Diffusion Probabilistic Models presented in the paper is based on [openai/improved-diffusion](https://github.com/openai/improved-diffusion).

## Paper Abstract

Diffusion models have shown impressive performance for generative modelling of images. In this paper, we present a novel semantic segmentation method based on diffusion models. By modifying the training and sampling scheme, we show that diffusion models can perform lesion segmentation of medical images. To generate an image specific segmentation, we train the model on the ground truth segmentation, and use the image as a prior during training and in every step during the sampling process. With the given stochastic sampling process, we can generate a distribution of segmentation masks. This property allows us to compute pixel-wise uncertainty maps of the segmentation, and allows an implicit ensemble of segmentations that increases the segmentation performance. We evaluate our method on the BRATS2020 dataset for brain tumor segmentation. Compared to state-of-the-art segmentation models, our approach yields good segmentation results and, additionally, detailed uncertainty maps.


## Data

We evaluated our method on the [BRATS2020 dataset](https://www.med.upenn.edu/cbica/brats2020/data.html).
For our dataloader, which can be found in the file *guided_diffusion/bratsloader.py*, the 2D slices need to be stored in the following structure:

```
data
└───training
│ └───slice0001
│ │ t1.nii.gz
│ │ t2.nii.gz
│ │ flair.nii.gz
│ │ t1ce.nii.gz
│ │ seg.nii.gz
│ └───slice0002
│ │ ...
└───testing
│ └───slice1000
│ │ t1.nii.gz
│ │ t2.nii.gz
│ │ flair.nii.gz
│ │ t1ce.nii.gz
│ └───slice1001
│ │ ...
```

A mini-example can be found in the folder *data*.
If you want to apply our code to another dataset, make sure the loaded image has attached the ground truth segmentation as the last channel.


## Usage

We set the flags as follows:
```
MODEL_FLAGS="--image_size 256 --num_channels 128 --class_cond False --num_res_blocks 2 --num_heads 1 --learn_sigma True --use_scale_shift_norm False --attention_resolutions 16"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-4 --batch_size 10"
```

To train the segmentation model, run

```
python3 scripts/segmentation_train.py --data_dir path/to/BRATS/training $TRAIN_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS
```
The model will be save in the *results* folder.
For sampling an ensemble of 5 segmentation masks from with the DDPM approach, run:

```
python scripts/segmentation_sample.py --data_dir path/to/BRATS/testing --model_path ./results/savedmodel.pt --num_ensemble=5 $MODEL_FLAGS $DIFFUSION_FLAGS
```

The generated segmentation masks will be stored in the *results* folder.

##Citation
If you use this code, please cite

```
@misc{wolleb2021diffusion,
title={Diffusion Models for Implicit Image Segmentation Ensembles},
author={Julia Wolleb and Robin Sandkühler and Florentin Bieder and Philippe Valmaggia and Philippe C. Cattin},
year={2021},
eprint={2112.03145},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
68 changes: 68 additions & 0 deletions guided_diffusion/bratsloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import torch
import torch.nn
import numpy as np
import os
import os.path
import nibabel


class BRATSDataset(torch.utils.data.Dataset):
def __init__(self, directory, test_flag=True):
'''
directory is expected to contain some folder structure:
if some subfolder contains only files, all of these
files are assumed to have a name like
brats_train_001_XXX_123_w.nii.gz
where XXX is one of t1, t1ce, t2, flair, seg
we assume these five files belong to the same image
seg is supposed to contain the segmentation
'''
super().__init__()
self.directory = os.path.expanduser(directory)

self.test_flag=test_flag
if test_flag:
self.seqtypes = ['t1', 't1ce', 't2', 'flair']
else:
self.seqtypes = ['t1', 't1ce', 't2', 'flair', 'seg']

self.seqtypes_set = set(self.seqtypes)
self.database = []
for root, dirs, files in os.walk(self.directory):
# if there are no subdirs, we have data
if not dirs:
files.sort()
datapoint = dict()
# extract all files as channels
for f in files:
seqtype = f.split('_')[3]
datapoint[seqtype] = os.path.join(root, f)
assert set(datapoint.keys()) == self.seqtypes_set, \
f'datapoint is incomplete, keys are {datapoint.keys()}'
self.database.append(datapoint)

def __getitem__(self, x):
out = []
filedict = self.database[x]
for seqtype in self.seqtypes:
nib_img = nibabel.load(filedict[seqtype])
print('path', filedict[seqtype])
path=filedict[seqtype]
out.append(torch.tensor(nib_img.get_fdata()))
out = torch.stack(out)
if self.test_flag:
image=out
image = image[..., 8:-8, 8:-8] #crop to a size of (224, 224)
return (image, path)
else:

image = out[:-1, ...]
label = out[-1, ...][None, ...]
image = image[..., 8:-8, 8:-8] #crop to a size of (224, 224)
label = label[..., 8:-8, 8:-8]
label=torch.where(label > 0, 1, 0).float() #merge all tumor classes into one
return (image, label)

def __len__(self):
return len(self.database)

87 changes: 87 additions & 0 deletions guided_diffusion/dist_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Helpers for distributed training.
"""

import io
import os
import socket

import blobfile as bf
#from mpi4py import MPI
import torch as th
import torch.distributed as dist

# Change this to reflect your cluster layout.
# The GPU for a given rank is (rank % GPUS_PER_NODE).
GPUS_PER_NODE = 8

SETUP_RETRY_COUNT = 3


def setup_dist():
"""
Setup a distributed process group.
"""
if dist.is_initialized():
return
os.environ["CUDA_VISIBLE_DEVICES"] = '0'

backend = "gloo" if not th.cuda.is_available() else "nccl"

if backend == "gloo":
hostname = "localhost"
else:
hostname = socket.gethostbyname(socket.getfqdn())
os.environ["MASTER_ADDR"] = '127.0.1.1'#comm.bcast(hostname, root=0)
os.environ["RANK"] = '0'#str(comm.rank)
os.environ["WORLD_SIZE"] = '1'#str(comm.size)

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 0))
s.listen(1)
port = s.getsockname()[1]
s.close()
print('port2', port)
os.environ["MASTER_PORT"] = str(port)
dist.init_process_group(backend=backend, init_method="env://")


def dev():
"""
Get the device to use for torch.distributed.
"""
if th.cuda.is_available():
return th.device(f"cuda")
return th.device("cpu")


def load_state_dict(path, **kwargs):
"""
Load a PyTorch file without redundant fetches across MPI ranks.
"""
mpigetrank=0
if mpigetrank==0:
with bf.BlobFile(path, "rb") as f:
data = f.read()
else:
data = None
return th.load(io.BytesIO(data), **kwargs)


def sync_params(params):
"""
Synchronize a sequence of Tensors across ranks from rank 0.
"""
for p in params:
with th.no_grad():
dist.broadcast(p, 0)


def _find_free_port():
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 0))
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return s.getsockname()[1]
finally:
s.close()
Loading

0 comments on commit 4387068

Please sign in to comment.