Skip to content

Commit

Permalink
feat: add voc tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
MT-zlchen committed Jul 6, 2022
1 parent 0c83426 commit 20b71d1
Show file tree
Hide file tree
Showing 6 changed files with 437 additions and 0 deletions.
Binary file added assets/image3.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/voc_loss_curve.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions data/voc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR
train: VOCdevkit/voc_07_12/images/train # train images
val: VOCdevkit/voc_07_12/images/val # val images
test: VOCdevkit/voc_07_12/images/val # test images (optional)

# whether it is coco dataset, only coco dataset should be set to True.
is_coco: False
# Classes
nc: 20 # number of classes
names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'] # class names
303 changes: 303 additions & 0 deletions docs/tutorial_voc.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training YOLOv6 on VOC dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1: Prepare VOC dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| dataset | size | images |\n",
"| :----: | :----: | :----: |\n",
"| VOC2007 trainval | 446MB | 5012 \n",
"| VOC2007 test | 438MB | 4953\n",
"| VOC2012 trainval | 1.95GB | 17126"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download VOC dataset and unzip them, the directory shows like:\n",
"```\n",
"VOCdevkit\n",
"├── VOC2007\n",
"│ ├── Annotations\n",
"│ ├── ImageSets\n",
"│ ├── JPEGImages\n",
"│ ├── SegmentationClass\n",
"│ └── SegmentationObject\n",
"└── VOC2012\n",
" ├── Annotations\n",
" ├── ImageSets\n",
" ├── JPEGImages\n",
" ├── SegmentationClass\n",
" └── SegmentationObject\n",
"```\n",
"we need to use **ImageSets** and **JPEGImages** to generate yolo-format dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: Convert VOC dataset to YOLO-format."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The VOC dataset use xml format annotations as below. (refer to [VOC2007 guidelines](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/guidelines.html))\n",
"```\n",
"<annotation>\n",
"\t<folder>VOC2007</folder>\n",
"\t<filename>000007.jpg</filename>\n",
"\t<source>\n",
"\t\t<database>The VOC2007 Database</database>\n",
"\t\t<annotation>PASCAL VOC2007</annotation>\n",
"\t\t<image>flickr</image>\n",
"\t\t<flickrid>194179466</flickrid>\n",
"\t</source>\n",
"\t<owner>\n",
"\t\t<flickrid>monsieurrompu</flickrid>\n",
"\t\t<name>Thom Zemanek</name>\n",
"\t</owner>\n",
"\t<size>\n",
"\t\t<width>500</width>\n",
"\t\t<height>333</height>\n",
"\t\t<depth>3</depth>\n",
"\t</size>\n",
"\t<segmented>0</segmented>\n",
"\t<object>\n",
"\t\t<name>car</name>\n",
"\t\t<pose>Unspecified</pose>\n",
"\t\t<truncated>1</truncated>\n",
"\t\t<difficult>0</difficult>\n",
"\t\t<bndbox>\n",
"\t\t\t<xmin>141</xmin>\n",
"\t\t\t<ymin>50</ymin>\n",
"\t\t\t<xmax>500</xmax>\n",
"\t\t\t<ymax>330</ymax>\n",
"\t\t</bndbox>\n",
"\t</object>\n",
"</annotation>\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the following command to convert voc dataset to yolo format:\n",
"\n",
"&ensp;&ensp;`python yolov6/data/voc2yolo.py --voc_path ./VOCdevkit`\n",
"\n",
"The converted dataset looks like:\n",
"```\n",
"VOCdevkit\n",
"├── images\n",
"│ ├── test2007\n",
"│ ├── train2007\n",
"│ ├── train2012\n",
"│ ├── val2007\n",
"│ └── val2012\n",
"├── labels\n",
"│ ├── test2007\n",
"│ ├── train2007\n",
"│ ├── train2012\n",
"│ ├── val2007\n",
"│ └── val2012\n",
"├── VOC2007\n",
"│ ├── Annotations\n",
"│ ├── ImageSets\n",
"│ ├── JPEGImages\n",
"│ ├── SegmentationClass\n",
"│ └── SegmentationObject\n",
"└── VOC2012\n",
" ├── Annotations\n",
" ├── ImageSets\n",
" ├── JPEGImages\n",
" ├── SegmentationClass\n",
" └── SegmentationObject\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We follow the `07+12` training setting, which means using VOC2007 and VOC2012's train+val(16551) as training set, VOC2007's test(4952) as validation set and testing set.\n",
"\n",
"The final converted voc dataset looks like:\n",
"```\n",
"voc_07_12\n",
"├── images\n",
"│ ├── train\n",
"│ └── val\n",
"└── labels\n",
" ├── train\n",
" └── val\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Visualize yolo format dataset (Optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To check if your dataset is correct, run the following command:\n",
"\n",
"&ensp;&ensp;`python yolov6/data/vis_dataset.py --img_dir VOCdevkit/images/train --label_dir VOCdevkit/labels/train`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3: Create dataset config file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create `data/voc.yaml` like:\n",
"\n",
"```\n",
"# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR\n",
"train: VOCdevkit/voc_07_12/images/train # train images\n",
"val: VOCdevkit/voc_07_12/images/val # val images\n",
"test: VOCdevkit/voc_07_12/images/val # test images (optional)\n",
"\n",
"# whether it is coco dataset, only coco dataset should be set to True.\n",
"is_coco: False\n",
"# Classes\n",
"nc: 20 # number of classes\n",
"names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',\n",
" 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'] # class names\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4: Training.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the following command to start training:\n",
"- Multi GPUs (DDP mode recommended)\n",
"\n",
"&ensp;&ensp;`python -m torch.distributed.launch --nproc_per_node 4 --master_port=23456 tools/train.py --batch 256 --conf configs/yolov6n_finetune.py --data data/voc.yaml --device 0,1,2,3`\n",
"\n",
"- Single GPU\n",
"\n",
"&ensp;&ensp;`python tools/train.py --batch 256 --conf configs/yolov6_finetune.py --data data/data.yaml --device 0`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tensorboard\n",
"We can use tensorboard to visualize the loss/mAP curve, run:\n",
"\n",
"&ensp;&ensp;`tensorboard --logdir=exp`\n",
"\n",
"![Traing loss/mAP curve](../assets/voc_loss_curve.jpg 'Traing loss/mAP curve')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Evaluation\n",
"When training finished, it automatically do evaulation on the testset, the output metrics are:\n",
"```\n",
"DONE (t=4.21s).\n",
" Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.632\n",
" Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.854\n",
" Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.702\n",
" Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.272\n",
" Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473\n",
" Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689\n",
" Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.518\n",
" Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.737\n",
" Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.751\n",
" Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.554\n",
" Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.656\n",
" Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.791\n",
"Epoch: 399 | [email protected]: 0.8542516455615079 | [email protected]:0.95: 0.6315693468708705\n",
"\n",
"Training completed in 9.206 hours.\n",
"```\n",
"Or you can manually evaulation model on your dataset by:\n",
"\n",
"&ensp;&ensp;`python tools/eval.py --data data/voc.yaml --weights runs/train/exp/weights/best_ckpt.pt --device 0`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.Inference\n",
"\n",
"&ensp;&ensp;`python tools/infer.py --weights runs/train/exp/weights/best_ckpt.pt --yaml data/voc.yaml --source data/images/image3.jpg --device 0`\n",
"\n",
"The result are saved in runs/inference/exp.\n",
"\n",
"![image3.jpg](../assets/image3.jpg)\n",
"### 6. Deployment\n",
"\n",
"&ensp;&ensp;`python deploy/ONNX/export_onnx.py --weights output_dir/name/weights/best_ckpt.pt --device 0`"
]
}
],
"metadata": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
},
"kernelspec": {
"display_name": "Python 3.8.2 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
57 changes: 57 additions & 0 deletions yolov6/data/vis_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# coding=utf-8
# Description: visualize yolo label image.

import argparse
import os
import cv2
import numpy as np

IMG_FORMATS = ["bmp", "jpg", "jpeg", "png", "tif", "tiff", "dng", "webp", "mpo"]

def main(args):
img_dir, label_dir, class_names = args.img_dir, args.label_dir, args.class_names

label_map = dict()
for class_id, classname in enumerate(class_names):
label_map[class_id] = classname

for file in os.listdir(img_dir):
if file.split('.')[-1] not in IMG_FORMATS:
print(f'[Warning]: Non-image file {file}')
continue
img_path = os.path.join(img_dir, file)
label_path = os.path.join(label_dir, file[: file.rindex('.')] + '.txt')

try:
img_data = cv2.imread(img_path)
height, width, _ = img_data.shape
color = [tuple(np.random.choice(range(256), size=3)) for i in class_names]
thickness = 2

with open(label_path, 'r') as f:
for bbox in f:
cls, x_c, y_c, w, h = [float(v) if i > 0 else int(v) for i, v in enumerate(bbox.split('\n')[0].split(' '))]

x_tl = int((x_c - w / 2) * width)
y_tl = int((y_c - h / 2) * height)
cv2.rectangle(img_data, (x_tl, y_tl), (x_tl + int(w * width), y_tl + int(h * height)), tuple([int(x) for x in color[cls]]), thickness)
cv2.putText(img_data, label_map[cls], (x_tl, y_tl - 10), cv2.FONT_HERSHEY_COMPLEX, 1, tuple([int(x) for x in color[cls]]), thickness)

cv2.imshow('image', img_data)
cv2.waitKey(0)
except Exception as e:
print(f'[Error]: {e} {img_path}')
print('======All Done!======')


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--img_dir', default='VOCdevkit/voc_07_12/images')
parser.add_argument('--label_dir', default='VOCdevkit/voc_07_12/labels')
parser.add_argument('--class_names', default=['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'])

args = parser.parse_args()
print(args)

main(args)
Loading

0 comments on commit 20b71d1

Please sign in to comment.