This is a model for semantic segmentation based on LinkNet (Unet-like architecture).
SEGM-model is a part of ReadingPipeline repo.
In the demo you can find an example of using the SEGM-model (you can run it in your Google Colab).
- Nvidia drivers >= 470, CUDA >= 11.4
- Docker, nvidia-docker
The provided Dockerfile is supplied to build an image with CUDA support and cuDNN.
- Clone the repo.
- Download and extract dataset to the
data/
folder. sudo make all
to build a docker image and create a container. Orsudo make all GPUS=device=0 CPUS=10
if you want to specify gpu devices and limit CPU-resources.
If you don't want to use Docker, you can install dependencies via requirements.txt
You can change the segm_config.json and set some of the the base training and evaluating parameters: num epochs, image size, saving dir, etc.
Parameters in the classes
are set individually for each class of the model. The order of the subdicts in the classes
corresponds to the order of the mask layers in the predicted tensor. Each dictionary contains parameters for model classes to pre- and post-process stages, for example:
"classes": {
"class_name": {
"annotation_classes": ["class1", "class2"],
"polygon2mask": {
"ShrinkMaskMaker": {"shrink_ratio": 0.5}
},
"postprocess": {
"threshold": 0.8,
"min_area": 10
}
},
...
}
annotation_classes
- a list with class names fromannotation["categories"]
. If multiple are passed, classes will be merged.polygon2mask
- a list of functions that will be applied one by one to convert polygons to mask and prepare target for this class. There are several functions available - to create regular, border or shrinked masks. To add a new function to the processing, you need to add it to thePREPROCESS_FUNC
dictionary in prepare_dataset.py and also specify it in thepolygon2mask
-dict in the config.
Postprocessing settings:
threshold
is the threshold of the model's confidence. Above this value the mask becomes Ture, below - False. It helps to remove some false predictions of the model with low confidence.min_area
- the minimum area of a polygon (polygons with less area will be removed).
Individual for train / val / test:
"train": {
"datasets": [
{
"json_path": "path/to/annotaion.json",
"image_root": "path/to/folder/with/images",
"processed_data_path": "path/to/save/processed_dataset.csv",
"prob": 0.5
},
...
],
"epoch_size": 2000,
"batch_size": 8
}
In datasets
-dict, you can specify paths to multiple datasets for train / test / val processes.
json_path
(to the annotation.json) andimage_root
(to the folder with images) are paths to the dataset with markup in COCO format.processed_data_path
- the saving path of the final csv file, which is produced by the prepare_dataset.py script. This csv-file will be used in the train stage. This file stores paths to the processed target masks.epoch_size
- the size of an epoch. If you set it tonull
, then the epoch size will be equal to the amount of samples in the all datasets.- It is also possible to specify several datasets for the train/validation/test, setting the probabilities for each dataset separately (the sum of
prob
can be greater than 1, since normalization occurs inside the processing).
The input dataset should be in COCO format. The annotation.json
should have the following dictionaries:
annotation["categories"]
- a list of dicts with a categories info (categotiy names and indexes).annotation["images"]
- a list of dictionaries with a description of images, each dictionary must contain fields:file_name
- name of the image file.id
for image id.
annotation["annotations"]
- a list of dictioraties with a murkup information. Each dictionary stores a description for one polygon from the dataset, and must contain the following fields:image_id
- the index of the image on which the polygon is located.category_id
- the polygon’s category index.segmentation
- the coordinates of the polygon, a list of numbers - which are coordinate pairs x and y.
To preprocess dataset and create target masks for training:
python scripts/prepare_dataset.py --config_path path/to/the/segm_config.json
The script creates a target masks for train/val/test stages. The path to the input dataset is set in the config file in json_path
and image_root
. The output csv file is saved to processed_data_path
from the config.
To train the model:
python scripts/train.py --config_path path/to/the/segm_config.json
To test the model:
python scripts/evaluate.py \
--config_path path/to/the/segm_config.json \
--model_path path/to/the/model-weights.ckpt
You can convert Torch model to ONNX to speed up inference on cpu.
python scripts/torch2onnx.py \
--config_path path/to/the/ocr_config.json \
--model_path path/to/the/model-weights.ckpt