deep-imcyto is a bioinformatics analysis pipeline for segmentation and other principal tasks in imaging mass cytometry data analysis. It is an update and extension of nfcore/imcyto, a bioinformatics analysis pipeline developed by van Maldegem et al. for IMC image segmentation and extraction of single cell expression data. deep-imcyto provides highly accurate cell segmentation of IMC images based on a U-net++ deep learning model as well as facilities for QC and manual review of image processing steps, which can be invaluable during IMC experimental design.
deep-imcyto is implemented in Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
deep-imcyto is a consitutent component of the TRACERx-PHLEX pipeline for highly multiplexed imaging. Other components include TYPEx, for detailed cell phenotyping and Spatial-PHLEX for single cell spatial data analysis.
deep-imcyto has three modes of operation: QC
, Simple segmentation
and Multiplexed Consensus Cell Segmentation
, summarised in the diagram below.
deep-imcyto's QC mode is designed to provide quick access to individual channels in IMC data for quality control and/or review by splitting .mcd files into constituent channel images by imaged ROI. If a particular preprocessing option is selected (e.g. spillover correction
, hotpixel removal
or the application of a custom set of preprocessing steps specified as a CellProfiler .cppipe
file) then this preprocessing will be performed, as produced as an output of the QC run for manual review.
-
Simple
In
simple
segmentation mode an approximation of whole cell segmentation is performed where accurate predicted nuclei are dilated by a user-defined number of pixels. -
Multiplexed consensus cell segmentation (MCCS)
In
MCCS
mode a more accurate whole cell segmentation is performed following the multiplexed consensus cell segmentation principles using nuclear predictions and progressive masking of specific marker channels (See [LINK TO PAPER] and [LINK TO READTHEDOCS]). MCCS procedures are provided to deep-imcyto as a CellProfiler pipeline which is then executed in as parallel way as possible via Nextflow.
-
Clone the deep-imcyto repository.
-
Download both the deep-imcyto trained nucleus model weights and the example test dataset from our Zenodo repository (https://doi.org/10.5281/zenodo.7573269)
-
Unzip these
.zip
archives to an appropriate location respectively (total space required ~1GB) -
Ensure your HPC system has
Nextflow/22.04.0
andSingularity/3.6.4
installed. -
Set your profile (see below)
-
Edit the following script appropriately and run it from a compute node.
This will run deep-imcyto in
simple
segmentation mode.
#!/bin/bash
## LOAD MODULES
ml purge
ml Nextflow/22.04.0
ml Singularity/3.6.4
# Define a folder on your system for the deep-imcyto software containers to be stored (space required ~10GB):
export NXF_SINGULARITY_CACHEDIR='/path/to/containers/deep-imcyto'
# RUN DEEP-IMCYTO:
nextflow run ./main.nf\
--input "/path/to/test/dataset/*/*/*.tiff"\
--outdir '../results/simple'\
--metadata 'assets/metadata/PHLEX_simple_segmentation_metadata_p1.csv'\
--email your_email@your_institute.ac.uk\
--nuclear_weights_directory "/path/to/weights/directory"\
--segmentation_workflow 'simple'\
--nuclear_dilation_radius 5\
--preprocess_method 'hotpixel'\
--n_neighbours 5\
--singularity_bind_path '/camp'\
-w '/path/to/work/directory/'\
-profile <docker/singularity/institute>
To run deep-imcyto in MCCS
mode, run the following:
#!/bin/bash
## LOAD MODULES
ml purge
ml Nextflow/22.04.0
ml Singularity/3.6.4
# Define a folder on your system for the deep-imcyto software containers to be stored (space required ~10GB):
export NXF_SINGULARITY_CACHEDIR='/path/to/containers/deep-imcyto'
# RUN DEEP-IMCYTO:
nextflow run ./main.nf\
--input "/path/to/test/dataset/*/*/*.tiff"\
--outdir '../results/MCCS'\
--metadata 'assets/metadata/PHLEX_simple_segmentation_metadata_p1.csv'\
--email [email protected]\
--nuclear_weights_directory "/path/to/weights/directory"\
--segmentation_workflow 'MCCS'\
--full_stack_cppipe './assets/cppipes/full_stack_preprocessing.cppipe'\
--segmentation_cppipe './assets/cppipes/segmentationP1.cppipe'\
--mccs_stack_cppipe './assets/cppipes/mccs_stack_preprocessing.cppipe'\
--compensation_tiff './assets/spillover/P1_imc*.tiff'\
--plugins "./assets/plugins"\
--singularity_bind_path '/camp'\
-w '/path/to/work/directory/'\
-profile <docker/singularity/institute>
The variable singularity_bind_path
tells deep-imcyto how to bind paths inside and outside the deep-imcyto Docker/Singularity container. If it is not explicitely set deep-imcyto attempts to use the root of the absolute path to the deepimcyto repository base directory [i.e. /path
in /path/to/deep-imcyto
].
See usage docs for all of the available options when running the pipeline.
deep-imcyto runs inside a customised Docker container built on top of the rapids-22.02-cuda11.0-base-ubuntu18.04-py3.8 Docker container for reproducible GPU-accelerated data science. Important prerequisites for the RAPIDS are as follows:
- NVIDIA Pascal™ GPU architecture or better
- CUDA 11.2/11.4/11.5 with a compatible NVIDIA driver
- nvidia-container-toolkit
See RAPIDS for more information.
The nf-core/imcyto pipeline comes with documentation about the pipeline, found in the docs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
deep-imcyto is primarily developed by Alastair Magness at The Francis Crick Institute. Other core contributors include Emma Colliver, Mihaela Angelova, and Katey Enfield.
nf-core/imcyto was originally written by The Bioinformatics & Biostatistics Group for use at The Francis Crick Institute, London. It was developed by Harshil Patel and Nourdine Bah in collaboration with Karishma Valand, Febe van Maldegem among others.
It would not have been possible to develop this pipeline without the guidelines, scripts and plugins provided by the Bodenmiller Lab. Thank you too!
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
ReadCube: Full Access Link
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.