English | 简体中文
Implementation of paper - YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
YOLOv6 has a series of models for various industrial scenarios, including N/T/S/M/L, which the architectures vary considering the model size for better accuracy-speed trade-off. And some Bag-of-freebies methods are introduced to further improve the performance, such as self-distillation and more training epochs. For industrial deployment, we adopt QAT with channel-wise distillation and graph optimization to pursue extreme performance.
YOLOv6-N hits 35.9% AP on COCO dataset with 1234 FPS on T4. YOLOv6-S strikes 43.5% AP with 495 FPS, and the quantized YOLOv6-S model achieves 43.3% AP at a accelerated speed of 869 FPS on T4. YOLOv6-T/M/L also have excellent performance, which show higher accuracy than other detectors with the similar inference speed.
- Release M/L models and update N/T/S models with enhanced performance.⭐️ Benchmark
- 2x faster training time.
- Fix the degration of performance when evaluating on 640x640 inputs.
- Customized quantization methods. 🚀 Quantization Tutorial
Model | Size | mAPval 0.5:0.95 |
SpeedT4 trt fp16 b1 (fps) |
SpeedT4 trt fp16 b32 (fps) |
Params (M) |
FLOPs (G) |
---|---|---|---|---|---|---|
YOLOv6-N | 640 | 35.9300e 36.3400e |
802 | 1234 | 4.3 | 11.1 |
YOLOv6-T | 640 | 40.3300e 41.1400e |
449 | 659 | 15.0 | 36.7 |
YOLOv6-S | 640 | 43.5300e 43.8400e |
358 | 495 | 17.2 | 44.2 |
YOLOv6-M | 640 | 49.5 | 179 | 233 | 34.3 | 82.2 |
YOLOv6-L-ReLU | 640 | 51.7 | 113 | 149 | 58.5 | 144.0 |
YOLOv6-L | 640 | 52.5 | 98 | 121 | 58.5 | 144.0 |
Model | Size | Precision | mAPval 0.5:0.95 |
SpeedT4 trt b1 (fps) |
SpeedT4 trt b32 (fps) |
---|---|---|---|---|---|
YOLOv6-N RepOpt | 640 | INT8 | 34.8 | 1114 | 1828 |
YOLOv6-N | 640 | FP16 | 35.9 | 802 | 1234 |
YOLOv6-T RepOpt | 640 | INT8 | 39.8 | 741 | 1167 |
YOLOv6-T | 640 | FP16 | 40.3 | 449 | 659 |
YOLOv6-S RepOpt | 640 | INT8 | 43.3 | 619 | 924 |
YOLOv6-S | 640 | FP16 | 43.5 | 377 | 541 |
- Speed is tested with TensorRT 8.4 on T4.
- Precision is figured on models for 300 epochs.
- Results of the mAP and speed are evaluated on COCO val2017 dataset with the input resolution of 640×640.
- Refer to Test speed tutorial to reproduce the speed results of YOLOv6.
- Params and FLOPs of YOLOv6 are estimated on deployed models.
Legacy models
Model | Size | mAPval 0.5:0.95 |
SpeedV100 fp16 b32 (ms) |
SpeedV100 fp32 b32 (ms) |
SpeedT4 trt fp16 b1 (fps) |
SpeedT4 trt fp16 b32 (fps) |
Params (M) |
FLOPs (G) |
---|---|---|---|---|---|---|---|---|
YOLOv6-N | 416 640 |
30.8 35.0 |
0.3 0.5 |
0.4 0.7 |
1100 788 |
2716 1242 |
4.3 4.3 |
4.7 11.1 |
YOLOv6-T | 640 | 41.3 | 0.9 | 1.5 | 425 | 602 | 15.0 | 36.7 |
YOLOv6-S | 640 | 43.1 | 1.0 | 1.7 | 373 | 520 | 17.2 | 44.2 |
Install
git clone https://github.com/meituan/YOLOv6
cd YOLOv6
pip install -r requirements.txt
Training
Single GPU
python tools/train.py --batch 32 --conf configs/yolov6s_finetune.py --data data/dataset.yaml --device 0
Multi GPUs (DDP mode recommended)
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 256 --conf configs/yolov6s_finetune.py --data data/dataset.yaml --device 0,1,2,3,4,5,6,7
- conf: select config file to specify network/optimizer/hyperparameters. Pretrained model path is recommended to be specified in the config file with the
pretrained
parameter if training on your custom dataset. - data: prepare COCO dataset, YOLO format coco labels and specify dataset paths in data.yaml
- make sure your dataset structure as follows:
├── coco
│ ├── annotations
│ │ ├── instances_train2017.json
│ │ └── instances_val2017.json
│ ├── images
│ │ ├── train2017
│ │ └── val2017
│ ├── labels
│ │ ├── train2017
│ │ ├── val2017
│ ├── LICENSE
│ ├── README.txt
Reproduce our results on COCO ⭐️
For nano model
python -m torch.distributed.launch --nproc_per_node 4 tools/train.py \
--batch 128 \
--conf configs/yolov6n.py \
--data data/coco.yaml \
--epoch 400 \
--device 0,1,2,3 \
--name yolov6n_coco
For s/tiny model
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py \
--batch 256 \
--conf configs/yolov6s.py \ # configs/yolov6t.py
--data data/coco.yaml \
--epoch 400 \
--device 0,1,2,3,4,5,6,7 \
--name yolov6s_coco # yolov6t_coco
For m/l model
# Step 1: Training a base model
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py \
--batch 256 \
--conf configs/yolov6m.py \ # configs/yolov6l.py
--data data/coco.yaml \
--epoch 300 \
--device 0,1,2,3,4,5,6,7 \
--name yolov6m_coco # yolov6l_coco
# Step 2: Self-distillation training
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py \
--batch 256 \ # 128 for distillation of yolov6l
--conf configs/yolov6m.py \ # configs/yolov6l.py
--data data/coco.yaml \
--epoch 300 \
--device 0,1,2,3,4,5,6,7 \
--distill \
--teacher_model_path runs/train/yolov6m_coco/weights/best_ckpt.pt \ # # yolov6l_coco
--name yolov6m_coco # yolov6l_coco
Resume training
If your training process is corrupted, you can resume training by
# multi GPU training.
python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --resume
Your can also specify a checkpoint path to --resume
parameter by
# remember to replace /path/to/your/checkpoint/path to the checkpoint path which you want to resume training.
--resume /path/to/your/checkpoint/path
Evaluation
Reproduce mAP on COCO val2017 dataset with 640×640 resolution ⭐️
python tools/eval.py --data data/coco.yaml --batch 32 --weights yolov6s.pt --task val --reproduce_640_eval
- verbose: set True to print mAP of each classes.
- do_coco_metric: set True / False to enable / disable pycocotools evaluation method.
- do_pr_metric: set True / False to print or not to print the precision and recall metrics.
- config-file: specify a config file to define all the eval params, for example: yolov6n_with_eval_params.py
Inference
First, download a pretrained model from the YOLOv6 release or use your trained model to do inference.
Second, run inference with tools/infer.py
python tools/infer.py --weights yolov6s.pt --source img.jpg / imgdir / video.mp4