feat: add voc tutorial

Zhengzhuo0309 · Jul 6, 2022 · 20b71d1 · 20b71d1
1 parent 0c83426
commit 20b71d1
Show file tree

Hide file tree

Showing 6 changed files with 437 additions and 0 deletions.
diff --git a/assets/image3.jpg b/assets/image3.jpg
diff --git a/assets/voc_loss_curve.jpg b/assets/voc_loss_curve.jpg
diff --git a/data/voc.yaml b/data/voc.yaml
@@ -0,0 +1,11 @@
+# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR
+train: VOCdevkit/voc_07_12/images/train # train images
+val: VOCdevkit/voc_07_12/images/val # val images
+test: VOCdevkit/voc_07_12/images/val # test images (optional)
+
+# whether it is coco dataset, only coco dataset should be set to True.
+is_coco: False
+# Classes
+nc: 20  # number of classes
+names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
+        'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']  # class names
diff --git a/docs/tutorial_voc.ipynb b/docs/tutorial_voc.ipynb
@@ -0,0 +1,303 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Training YOLOv6 on VOC dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 1: Prepare VOC dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "|  dataset |  size  | images  |\n",
+    "|  :----:  |  :----:  | :----:  |\n",
+    "| VOC2007 trainval  | 446MB | 5012  \n",
+    "| VOC2007 test  | 438MB | 4953\n",
+    "| VOC2012 trainval  | 1.95GB | 17126"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Download VOC dataset and unzip them, the directory shows like:\n",
+    "```\n",
+    "VOCdevkit\n",
+    "├── VOC2007\n",
+    "│   ├── Annotations\n",
+    "│   ├── ImageSets\n",
+    "│   ├── JPEGImages\n",
+    "│   ├── SegmentationClass\n",
+    "│   └── SegmentationObject\n",
+    "└── VOC2012\n",
+    "    ├── Annotations\n",
+    "    ├── ImageSets\n",
+    "    ├── JPEGImages\n",
+    "    ├── SegmentationClass\n",
+    "    └── SegmentationObject\n",
+    "```\n",
+    "we need to use **ImageSets** and **JPEGImages** to generate yolo-format dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 2: Convert VOC dataset to YOLO-format."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The VOC dataset use xml format annotations as below. (refer to [VOC2007 guidelines](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/guidelines.html))\n",
+    "```\n",
+    "<annotation>\n",
+    "\t<folder>VOC2007</folder>\n",
+    "\t<filename>000007.jpg</filename>\n",
+    "\t<source>\n",
+    "\t\t<database>The VOC2007 Database</database>\n",
+    "\t\t<annotation>PASCAL VOC2007</annotation>\n",
+    "\t\t<image>flickr</image>\n",
+    "\t\t<flickrid>194179466</flickrid>\n",
+    "\t</source>\n",
+    "\t<owner>\n",
+    "\t\t<flickrid>monsieurrompu</flickrid>\n",
+    "\t\t<name>Thom Zemanek</name>\n",
+    "\t</owner>\n",
+    "\t<size>\n",
+    "\t\t<width>500</width>\n",
+    "\t\t<height>333</height>\n",
+    "\t\t<depth>3</depth>\n",
+    "\t</size>\n",
+    "\t<segmented>0</segmented>\n",
+    "\t<object>\n",
+    "\t\t<name>car</name>\n",
+    "\t\t<pose>Unspecified</pose>\n",
+    "\t\t<truncated>1</truncated>\n",
+    "\t\t<difficult>0</difficult>\n",
+    "\t\t<bndbox>\n",
+    "\t\t\t<xmin>141</xmin>\n",
+    "\t\t\t<ymin>50</ymin>\n",
+    "\t\t\t<xmax>500</xmax>\n",
+    "\t\t\t<ymax>330</ymax>\n",
+    "\t\t</bndbox>\n",
+    "\t</object>\n",
+    "</annotation>\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following command to convert voc dataset to yolo format:\n",
+    "\n",
+    "&ensp;&ensp;`python yolov6/data/voc2yolo.py --voc_path ./VOCdevkit`\n",
+    "\n",
+    "The converted dataset looks like:\n",
+    "```\n",
+    "VOCdevkit\n",
+    "├── images\n",
+    "│   ├── test2007\n",
+    "│   ├── train2007\n",
+    "│   ├── train2012\n",
+    "│   ├── val2007\n",
+    "│   └── val2012\n",
+    "├── labels\n",
+    "│   ├── test2007\n",
+    "│   ├── train2007\n",
+    "│   ├── train2012\n",
+    "│   ├── val2007\n",
+    "│   └── val2012\n",
+    "├── VOC2007\n",
+    "│   ├── Annotations\n",
+    "│   ├── ImageSets\n",
+    "│   ├── JPEGImages\n",
+    "│   ├── SegmentationClass\n",
+    "│   └── SegmentationObject\n",
+    "└── VOC2012\n",
+    "    ├── Annotations\n",
+    "    ├── ImageSets\n",
+    "    ├── JPEGImages\n",
+    "    ├── SegmentationClass\n",
+    "    └── SegmentationObject\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We follow the `07+12` training setting, which means using VOC2007 and VOC2012's train+val(16551) as training set, VOC2007's test(4952) as validation set and testing set.\n",
+    "\n",
+    "The final converted voc dataset looks like:\n",
+    "```\n",
+    "voc_07_12\n",
+    "├── images\n",
+    "│   ├── train\n",
+    "│   └── val\n",
+    "└── labels\n",
+    "    ├── train\n",
+    "    └── val\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Visualize yolo format dataset (Optional)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To check if your dataset is correct, run the following command:\n",
+    "\n",
+    "&ensp;&ensp;`python yolov6/data/vis_dataset.py --img_dir VOCdevkit/images/train --label_dir VOCdevkit/labels/train`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3: Create dataset config file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create `data/voc.yaml` like:\n",
+    "\n",
+    "```\n",
+    "# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR\n",
+    "train: VOCdevkit/voc_07_12/images/train # train images\n",
+    "val: VOCdevkit/voc_07_12/images/val # val images\n",
+    "test: VOCdevkit/voc_07_12/images/val # test images (optional)\n",
+    "\n",
+    "# whether it is coco dataset, only coco dataset should be set to True.\n",
+    "is_coco: False\n",
+    "# Classes\n",
+    "nc: 20  # number of classes\n",
+    "names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',\n",
+    "        'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']  # class names\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 4: Training.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the following command to start training:\n",
+    "- Multi GPUs (DDP mode recommended)\n",
+    "\n",
+    "&ensp;&ensp;`python -m torch.distributed.launch --nproc_per_node 4 --master_port=23456 tools/train.py --batch 256 --conf configs/yolov6n_finetune.py --data data/voc.yaml --device 0,1,2,3`\n",
+    "\n",
+    "- Single GPU\n",
+    "\n",
+    "&ensp;&ensp;`python tools/train.py --batch 256 --conf configs/yolov6_finetune.py --data data/data.yaml --device 0`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Tensorboard\n",
+    "We can use tensorboard to visualize the loss/mAP curve, run:\n",
+    "\n",
+    "&ensp;&ensp;`tensorboard --logdir=exp`\n",
+    "\n",
+    "![Traing loss/mAP curve](../assets/voc_loss_curve.jpg 'Traing loss/mAP curve')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Evaluation\n",
+    "When training finished, it automatically do evaulation on the testset, the output metrics are:\n",
+    "```\n",
+    "DONE (t=4.21s).\n",
+    " Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.632\n",
+    " Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.854\n",
+    " Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.702\n",
+    " Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.272\n",
+    " Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473\n",
+    " Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689\n",
+    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.518\n",
+    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.737\n",
+    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.751\n",
+    " Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.554\n",
+    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.656\n",
+    " Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.791\n",
+    "Epoch: 399 | [email protected]: 0.8542516455615079 | [email protected]:0.95: 0.6315693468708705\n",
+    "\n",
+    "Training completed in 9.206 hours.\n",
+    "```\n",
+    "Or you can manually evaulation model on your dataset by:\n",
+    "\n",
+    "&ensp;&ensp;`python tools/eval.py --data data/voc.yaml  --weights runs/train/exp/weights/best_ckpt.pt --device 0`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.Inference\n",
+    "\n",
+    "&ensp;&ensp;`python tools/infer.py --weights runs/train/exp/weights/best_ckpt.pt --yaml data/voc.yaml --source data/images/image3.jpg --device 0`\n",
+    "\n",
+    "The result are saved in runs/inference/exp.\n",
+    "\n",
+    "![image3.jpg](../assets/image3.jpg)\n",
+    "### 6. Deployment\n",
+    "\n",
+    "&ensp;&ensp;`python deploy/ONNX/export_onnx.py --weights output_dir/name/weights/best_ckpt.pt --device 0`"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.8.2 64-bit",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.0"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/yolov6/data/vis_dataset.py b/yolov6/data/vis_dataset.py
@@ -0,0 +1,57 @@
+# coding=utf-8
+# Description:  visualize yolo label image.
+
+import argparse
+import os
+import cv2
+import numpy as np
+
+IMG_FORMATS = ["bmp", "jpg", "jpeg", "png", "tif", "tiff", "dng", "webp", "mpo"]
+
+def main(args):
+    img_dir, label_dir, class_names = args.img_dir, args.label_dir, args.class_names
+
+    label_map = dict()
+    for class_id, classname in enumerate(class_names):
+        label_map[class_id] = classname
+
+    for file in os.listdir(img_dir):
+        if file.split('.')[-1] not in IMG_FORMATS:
+            print(f'[Warning]: Non-image file {file}')
+            continue
+        img_path = os.path.join(img_dir, file)
+        label_path = os.path.join(label_dir, file[: file.rindex('.')] + '.txt')
+
+        try:
+            img_data = cv2.imread(img_path)
+            height, width, _ = img_data.shape
+            color = [tuple(np.random.choice(range(256), size=3)) for i in class_names]
+            thickness = 2
+
+            with open(label_path, 'r') as f:
+                for bbox in f:
+                    cls, x_c, y_c, w, h = [float(v) if i > 0 else int(v) for i, v in enumerate(bbox.split('\n')[0].split(' '))]
+
+                    x_tl = int((x_c - w / 2) * width)
+                    y_tl = int((y_c - h / 2) * height)
+                    cv2.rectangle(img_data, (x_tl, y_tl), (x_tl + int(w * width), y_tl + int(h * height)), tuple([int(x) for x in color[cls]]), thickness)
+                    cv2.putText(img_data, label_map[cls], (x_tl, y_tl - 10), cv2.FONT_HERSHEY_COMPLEX, 1, tuple([int(x) for x in color[cls]]), thickness)
+
+            cv2.imshow('image', img_data)
+            cv2.waitKey(0)
+        except Exception as e:
+            print(f'[Error]: {e} {img_path}')
+    print('======All Done!======')
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img_dir', default='VOCdevkit/voc_07_12/images')
+    parser.add_argument('--label_dir', default='VOCdevkit/voc_07_12/labels')
+    parser.add_argument('--class_names', default=['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
+        'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'])
+
+    args = parser.parse_args()
+    print(args)
+
+    main(args)