Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
thumbnails		thumbnails
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.py		download.py
dsmil.py		dsmil.py
dsmil_mil.ipynb		dsmil_mil.ipynb
env.yml		env.yml
train_mil.py		train_mil.py
train_tcga.py		train_tcga.py

Repository files navigation

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image

This is the Pytorch implementation for the multiple instance learning model described in the paper Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning.

Installation

Install anaconda/miniconda
Required packages

  $ conda env create --name dsmil --file env.yml
  $ conda activate dsmil

Features preparation

The MIL benchmark dataset can be downloaded via:

  $ python download.py --dataset=mil

If you are processing WSI data, you will need a pretrained embedder and precompute the features of each patch.

Your WSIs must be cropped into patches first. OpenSlide is a C library with a Python API that provides a simple interface to read WSI data. We refer the users to OpenSlide Python API document for the details of using this tool.
For training your embedder, we refer the users to Pytorch implementation of SimCLR. You would need to feed your WSI patches to the SimCLR framework with "input_shape" argument set as the size of the WSI patch in the configuration file (config.yml).

Otherwise, precomputed features for TCGA Lung Cancer dataset can be downloaded via:

  $ python download.py --dataset=tcga

This dataset requires 20GB of free disk space.

Training on default datasets

To train DSMIL on standard MIL benchmark dataset:

  $ python train_mil.py

To switch between MIL benchmark dataset, use option:

[--datasets]      # musk1, musk2, elephant, fox, tiger

Other options are available for learning rate (0.0002), cross validation fold (5), weight-decay (5e-3), and number of epochs (40).

To train DSMIL on TCGA Lung Cancer dataset:

 $ python train_tcga.py

Training on your own datasets

You could modify train_tcga.py to easily let it work with your datasets. You will need to:

For each bag, generate a .csv file where each row contains the feature of an instance. The .csv file should be named as "bagID.csv" and put into a folder named "dataset-name".
Generate a "dataset-name.csv" file with two columns where the first column contains bagID, and the second column contains the class label.
Replace the corresponding file path in the script with the file path of "dataset.csv" file, and change the data directory path in the dataloader to the path of the folder "dataset-name"
Configure the number of class for creating the DSMIL model.

Citation

If you use the code or results in your research, please use the following BibTeX entry.

@article{li2020dualstream,
  author =   {Bin Li and Yin Li and Kevin W. Eliceiri},
  title =    {Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning},
  journal =  {arXiv preprint arXiv:2011.08939},
  year =     {2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image

Installation

Features preparation

Training on default datasets

Training on your own datasets

Citation

About

Releases

Packages

Contributors 2

Languages

License

binli123/dsmil-wsi

Folders and files

Latest commit

History

Repository files navigation

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image

Installation

Features preparation

Training on default datasets

Training on your own datasets

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages