Step 1: download the data source images at the link "Dataset" of the following repository and unzip the "png.zip" as the source directory
Step 2: download the label datasets Cx22-*.zip
-
Cx22-Pair.zip
-
Cx22-Multi-Train.zip
-
Cx22-Multi-Test.zip
Step 3: unzip the label dataset Cx22-*.zip downloaded in Step 2. And the folders will be of the following structure:
Cx22-*:
* cyto
- cyto_clumps.mat ~ the labels of cytoplasm clumps
- cyto_ins.mat ~ the labels of cytoplasm instances
- cyto_ins_bbox.mat ~ the bounding-box labels of cytoplasm instances
* nuc
- nuc_clumps.mat ~ the labels of nucleus clumps
- nuc_ins.mat ~ the labels of nucleus instances
- nuc_ins_bbox.mat ~ the bounding-box labels of nucleus instances
* generator
- ImageDataGenerator.m ~ image generation codes
- ImageDataNames.mat ~ service for ImageDataGenerator.m
- ROIs_W_H.mat ~ service for ImageDataGenerator.m
- ROIs_x_y.mat ~ service for ImageDataGenerator.m
- CellNum.mat ~ number of cells in each generated image
- OverlapRatio.mat ~ overlap ratio of each cytoplasm instance in each generated image
Step 4:
First: run "ImageDataGenerator.m"
Then: select the source directory created in Step 1
Finally: waiting until the "ImageDataSet.mat" is generated. "ImageDataSet.mat" will be saved in the same directory of "ImageDataGenerator.m". "ImageDataSet.mat" records the images corresponding to the labels.
Usage of the evaluation codes below can be found here. More details of the usage can be found in the repository.
The evaluation codes:
evaluateCytoSegmentation.m SegEvaluateJIDiceTPRFPR.m CytoScriptExample.m
Description of the metrics can be found here
If you find our work useful in your research or use the labels of Cx22 in publishing your article, please cite our paper:
@article{LIU2022106194,
title = {Cx22: A new publicly available dataset for deep learning-based segmentation of cervical cytology images},
journal = {Computers in Biology and Medicine},
volume = {150},
pages = {106194},
year = {2022},
issn = {0010-4825},
doi = {https://doi.org/10.1016/j.compbiomed.2022.106194},
url = {https://www.sciencedirect.com/science/article/pii/S0010482522009027},
author = {Guangqi Liu and Qinghai Ding and Haibo Luo and Min Sha and Xiang Li and Moran Ju},
keywords = {Cervical cell dataset, Image segmentation, Instance segmentation, Semantic segmentation, Deep learning},
abstract = {The segmentation of cervical cytology images plays an important role in the automatic analysis of cervical cytology screening. Although deep learning-based segmentation methods are well-developed in other image segmentation areas, their application in the segmentation of cervical cytology images is still in the early stage. The most important reason for the slow progress is the lack of publicly available and high-quality datasets, and the study on the deep learning-based segmentation methods may be hampered by the present datasets which are either artificial or plagued by the issue of false-negative objects. In this paper, we develop a new dataset of cervical cytology images named Cx22, which consists of the completely annotated labels of the cellular instances based on the open-source images released by our institute previously. Firstly, we meticulously delineate the contours of 14,946 cellular instances in1320 images that are generated by our proposed ROI-based label cropping algorithm. Then, we propose the baseline methods for the deep learning-based semantic and instance segmentation tasks based on Cx22. Finally, through the experiments, we validate the task suitability of Cx22, and the results reveal the impact of false-negative objects on the performance of the baseline methods. Based on our work, Cx22 can provide a foundation for fellow researchers to develop high-performance deep learning-based methods for the segmentation of cervical cytology images. Other detailed information and step-by-step guidance on accessing the dataset are made available to fellow researchers at https://github.com/LGQ330/Cx22.}
}
Terms of use: by downloading the Cx22 you agree to the following term:
- You will use the data only for non-commercial research and educational purposes.
Please contact [email protected] if there is any question.