Update code (meituan#798)

* update network for lightweight models on mobile or CPU * simplify eval input parameters * update * update yolov6lite README.md * fix bug * fix resume loss&mAP bug * update README --------- Co-authored-by: gengyifei <[email protected]> Co-authored-by: YIF <[email protected]>
SangbumChoi · Apr 28, 2023 · 867c60a · 867c60a
1 parent d9c0491
commit 867c60a
Show file tree

Hide file tree

Showing 39 changed files with 3,247 additions and 280 deletions.
diff --git a/README.md b/README.md
@@ -25,14 +25,10 @@ Implementation of paper:
 
 
 ## What's New
+- [2023.04.28] Release [YOLOv6Lite](configs/yolov6_lite/README.md) models on mobile or CPU. ⭐️ [Mobile Benchmark](#Mobile-Benchmark)
 - [2023.03.10] Release [YOLOv6-Face](https://github.com/meituan/YOLOv6/tree/yolov6-face). 🔥 [Performance](https://github.com/meituan/YOLOv6/tree/yolov6-face#performance-on-widerface)
 - [2023.03.02] Update [base models](configs/base/README.md) to version 3.0.
 - [2023.01.06] Release P6 models and enhance the performance of P5 models. ⭐️ [Benchmark](#Benchmark)
-    - Renew the neck of the detector with a BiC module and SimCSPSPPF Block.
-    - Propose an anchor-aided training (AAT) strategy.
-    - Involve a new self-distillation strategy for small models of YOLOv6.
-    - Expand YOLOv6 and hit a new
-SOTA performance on the COCO dataset.
 - [2022.11.04] Release [base models](configs/base/README.md) to simplify the training and deployment process.
 - [2022.09.06] Customized quantization methods. 🚀 [Quantization Tutorial](./tools/qat/README.md)
 - [2022.09.05] Release M/L models and update N/T/S models with enhanced performance. 
@@ -88,6 +84,26 @@ SOTA performance on the COCO dataset.
 
 </details>
 
+## Mobile Benchmark
+| Model | Size | mAP<sup>val<br/>0.5:0.95 | sm8350<br/><sup>(ms) | mt6853<br/><sup>(ms) | sdm660<br/><sup>(ms) |Params<br/><sup> (M) |   FLOPs<br/><sup> (G) |
+| :----------------------------------------------------------- | ---- | -------------------- | -------------------- | -------------------- | -------------------- | -------------------- | -------------------- |
+| [**YOLOv6Lite-S**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_s.pt) | 320*320 | 22.4                     | 7.99                     | 11.99                     | 41.86                     | 0.55                     | 0.56                     |
+| [**YOLOv6Lite-M**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_m.pt) | 320*320 | 25.1                     | 9.08                     | 13.27                     | 47.95                     | 0.79                     | 0.67                     |
+| [**YOLOv6Lite-L**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_l.pt) | 320*320 | 28.0                     | 11.37                     | 16.20                     | 61.40                     | 1.09                     | 0.87                     |
+| [**YOLOv6Lite-L**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_l.pt) | 320*192 | 25.0                     | 7.02                     | 9.66                     | 36.13                     | 1.09                     | 0.52                     |
+| [**YOLOv6Lite-L**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_l.pt) | 224*128 | 18.9                     | 3.63                     | 4.99                     | 17.76                     | 1.09                     | 0.24                     |
+
+<details>
+<summary>Table Notes</summary>
+
+- From the perspective of model size and input image ratio, we have built a series of models on the mobile terminal to facilitate flexible applications in different scenarios. 
+- All checkpoints are trained with 400 epochs without distillation.
+- Results of the mAP and speed are evaluated on [COCO val2017](https://cocodataset.org/#download) dataset, and the input resolution is the Size in the table.
+- Speed is tested with MNN 2.3.0 AArch64. During the speed test, the arm82 acceleration is turned on, the inference warm-up is performed 10 times, and the cycle is performed 100 times.
+- Qualcomm 888(sm8350), Dimensity 720(mt6853) and Qualcomm 660(sdm660) correspond to chips with different performances at the high, middle and low end respectively, which can be used as a reference for model capabilities under different chips.
+- Refer to [Test NCNN Speed](./docs/Test_NCNN_speed.md) tutorial to reproduce the NCNN speed results of YOLOv6Lite.
+
+</details>
 
 ## Quick Start
 <details>

diff --git a/README_cn.md b/README_cn.md
@@ -16,13 +16,10 @@
 
 
 ## 更新日志
+- [2023.04.28] 发布 移动端轻量级模型 [YOLOv6Lite](configs/yolov6_lite/README.md). ⭐️ [移动端模型指标](#移动端模型指标)
 - [2023.03.10] 发布 [YOLOv6-Face](https://github.com/meituan/YOLOv6/tree/yolov6-face). 🔥 [人脸检测模型指标](https://github.com/meituan/YOLOv6/blob/yolov6-face/README_cn.md#widerface-%E6%A8%A1%E5%9E%8B%E6%8C%87%E6%A0%87)
-- [2023.03.02] 更新 [基础版模型](configs/base/READM_cn.md) 到 3.0 版本
+- [2023.03.02] 更新 [基础版模型](configs/base/README_cn.md) 到 3.0 版本
 - [2023.01.06] 发布大分辨率 P6 模型以及对 P5 模型做了全面的升级 ⭐️ [模型指标](#模型指标)
-    - 添加 BiC 模块 和 SimCSPSPPF 模块以增强检测网络颈部的表征能力。
-    - 提出一个锚点辅助训练 (AAT) 策略。
-    - 为 YOLOv6 小模型引入一个新的自蒸馏训练策略。
-    - 扩展 YOLOv6 并在 COCO 上取得了实时目标检测 SOTA 的精度和速度。
 - [2022.11.04] 发布 [基础版模型](configs/base/README_cn.md) 简化训练部署流程
 - [2022.09.06] 定制化的模型量化加速方法 🚀 [量化教程](./tools/qat/README.md)
 - [2022.09.05] 发布 M/L 模型，并且进一步提高了 N/T/S 模型的性能  
@@ -80,6 +77,27 @@
 
 </details>
 
+## 移动端模型指标
+
+| 模型 | 输入尺寸 | mAP<sup>val<br/>0.5:0.95 | sm8350<br/><sup>(ms) | mt6853<br/><sup>(ms) | sdm660<br/><sup>(ms) |Params<br/><sup> (M) |   FLOPs<br/><sup> (G) |
+| :----------------------------------------------------------- | ---- | -------------------- | -------------------- | -------------------- | -------------------- | -------------------- | -------------------- |
+| [**YOLOv6Lite-S**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_s.pt) | 320*320 | 22.4                     | 7.99                     | 11.99                     | 41.86                     | 0.55                     | 0.56                     |
+| [**YOLOv6Lite-M**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_m.pt) | 320*320 | 25.1                     | 9.08                     | 13.27                     | 47.95                     | 0.79                     | 0.67                     |
+| [**YOLOv6Lite-L**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_l.pt) | 320*320 | 28.0                     | 11.37                     | 16.20                     | 61.40                     | 1.09                     | 0.87                     |
+| [**YOLOv6Lite-L**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_l.pt) | 320*192 | 25.0                     | 7.02                     | 9.66                     | 36.13                     | 1.09                     | 0.52                     |
+| [**YOLOv6Lite-L**](https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6lite_l.pt) | 224*128 | 18.9                     | 3.63                     | 4.99                     | 17.76                     | 1.09                     | 0.24                     |
+
+<details>
+<summary>表格笔记</summary>
+
+- 从模型尺寸和输入图片比例两种角度，在构建了移动端系列模型，方便不同场景下的灵活应用。
+- 所有权重都经过 400 个 epoch 的训练，并且没有使用蒸馏技术。
+-  mAP 和速度指标是在 COCO val2017 数据集上评估的，输入分辨率为表格中对应展示的。
+- 使用 MNN 2.3.0 AArch64 进行速度测试。测速时，开启arm82加速，推理预热10次，循环100次。
+- 高通888(sm8350)、天玑720(mt6853)和高通660(sdm660)分别对应高中低端不同性能的芯片，可以作为不同芯片下机型能力的参考。
+- [NCNN 速度测试](./docs/Test_NCNN_speed.md)教程可以帮助展示及复现 YOLOv6Lite 的 NCNN 速度结果。
+
+</details>
 
 ## 快速开始
 

diff --git a/assets/yolov6lite_l_ncnn.jpg b/assets/yolov6lite_l_ncnn.jpg
diff --git a/configs/experiment/eval_640_repro.py b/configs/experiment/eval_640_repro.py
@@ -55,5 +55,25 @@
         img_size=1280,
         shrink_size=41,
         infer_on_rect=False,
+    ),
+    yolov6s_mbla = dict(
+        img_size=640,
+        shrink_size=7,
+        infer_on_rect=False,
+    ),
+    yolov6m_mbla = dict(
+        img_size=640,
+        shrink_size=7,
+        infer_on_rect=False,
+    ),
+    yolov6l_mbla = dict(
+        img_size=640,
+        shrink_size=7,
+        infer_on_rect=False,
+    ),
+    yolov6x_mbla = dict(
+        img_size=640,
+        shrink_size=3,
+        infer_on_rect=False,
     )
 )
diff --git a/configs/mbla/README.md b/configs/mbla/README.md
@@ -0,0 +1,28 @@
+## YOLOv6 mbla model
+
+English | [简体中文](./README_cn.md)
+
+### Features
+
+- Apply MBLABlock(Multi Branch Layer Aggregation Block) blocks in the network structure.
+
+Advantage:
+- Adopt a unified network structure and configuration.
+
+- Better performance for Small model comparing to yolov6 3.0 release.
+
+- Better performance comparing to yolov6 3.0 base.
+
+
+
+### Performance
+
+| Model                                                         | Size | mAP<sup>val<br/>0.5:0.95 | Speed<sup>T4<br/>trt fp16 b1 <br/>(fps) | Speed<sup>T4<br/>trt fp16 b32 <br/>(fps) | Params<br/><sup> (M) | FLOPs<br/><sup> (G) |
+| :----------------------------------------------------------- | -------- | :----------------------- | -------------------------------------- | --------------------------------------- | -------------------- | ------------------- |
+| [**YOLOv6-S-mbla**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6s_mbla.pt) | 640      | 47.0<sup>distill            | 300                                    | 424                                    | 11.6                  | 29.8                |
+| [**YOLOv6-M-mbla**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6m_mbla.pt) | 640      | 50.3<sup>distill            | 168                                    | 216                                     | 26.1                 | 66.7                |
+| [**YOLOv6-L-mbla**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6l_base.pt) | 640      | 52.0<sup>distill         | 129                                    | 154                                     | 46.3                 | 118.2                |
+| [**YOLOv6-X-base**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6x_base.pt) | 640      | 53.5<sup>distill         | 78                                    | 94                                     | 78.8                 | 199.0               |
+
+- Speed is tested with TensorRT 8.4.2.4 on T4.
+- The processes of model training, evaluation, and inference are the same as the original ones. For details, please refer to [this README](https://github.com/meituan/YOLOv6#quick-start).
diff --git a/configs/mbla/README_cn.md b/configs/mbla/README_cn.md
@@ -0,0 +1,26 @@
+## YOLOv6 MBLA版模型
+
+简体中文 | [English](./README.md)
+
+### 模型特点
+
+- 网络主体结构均采用MBLABlock(Multi Branch Layer Aggregation Block)
+
+优势：
+- 采用统一的网络结构和配置
+
+- 相比3.0版本在s尺度效果提升，相比3.0base版本各尺度效果提升
+
+
+
+### 模型指标
+
+| 模型                                                         | 输入尺寸 | mAP<sup>val<br/>0.5:0.95 | 速度<sup>T4<br/>trt fp16 b1 <br/>(fps) | 速度<sup>T4<br/>trt fp16 b32 <br/>(fps) | Params<br/><sup> (M) | FLOPs<br/><sup> (G) |
+| :----------------------------------------------------------- | -------- | :----------------------- | -------------------------------------- | --------------------------------------- | -------------------- | ------------------- |
+| [**YOLOv6-S-mbla**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6s_mbla.pt) | 640      | 47.0<sup>distill            | 300                                    | 424                                    | 11.6                  | 29.8                |
+| [**YOLOv6-M-mbla**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6m_mbla.pt) | 640      | 50.3<sup>distill            | 168                                    | 216                                     | 26.1                 | 66.7                |
+| [**YOLOv6-L-mbla**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6l_base.pt) | 640      | 52.0<sup>distill         | 129                                    | 154                                     | 46.3                 | 118.2                |
+| [**YOLOv6-X-base**](https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6x_base.pt) | 640      | 53.5<sup>distill         | 78                                    | 94                                     | 78.8                 | 199.0               |
+
+- 速度是在 T4 上测试的，TensorRT 版本为  8.4.2.4；
+- 模型训练、评估、推理流程与原来保持一致，具体可参考 [首页 README 文档](https://github.com/meituan/YOLOv6/blob/main/README_cn.md#%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B)。
diff --git a/configs/mbla/yolov6l_mbla.py b/configs/mbla/yolov6l_mbla.py
@@ -0,0 +1,70 @@
+# YOLOv6l model
+model = dict(
+    type='YOLOv6l_mbla',
+    pretrained=None,
+    depth_multiple=0.5,  
+    width_multiple=1.0,
+    backbone=dict(
+        type='CSPBepBackbone',
+        num_repeats=[1, 4, 8, 8, 4],
+        out_channels=[64, 128, 256, 512, 1024],
+        csp_e=float(1)/2,
+        fuse_P2=True,
+        stage_block_type="MBLABlock",
+        ),
+    neck=dict(
+        type='CSPRepBiFPANNeck',
+        num_repeats=[8, 8, 8, 8],
+        out_channels=[256, 128, 128, 256, 256, 512],
+        csp_e=float(1)/2,
+        stage_block_type="MBLABlock",
+        ),
+    head=dict(
+        type='EffiDeHead',
+        in_channels=[128, 256, 512],
+        num_layers=3,
+        begin_indices=24,
+        anchors=3,
+        anchors_init=[[10,13, 19,19, 33,23], 
+                      [30,61, 59,59, 59,119], 
+                      [116,90, 185,185, 373,326]],
+        out_indices=[17, 20, 23],
+        strides=[8, 16, 32],
+        atss_warmup_epoch=0,
+        iou_type='giou',
+        use_dfl=True,
+        reg_max=16, #if use_dfl is False, please set reg_max to 0
+        distill_weight={
+            'class': 2.0,
+            'dfl': 1.0,
+        },
+    )
+)
+
+solver=dict(
+    optim='SGD',
+    lr_scheduler='Cosine',
+    lr0=0.01,
+    lrf=0.01,
+    momentum=0.937,
+    weight_decay=0.0005,
+    warmup_epochs=3.0,
+    warmup_momentum=0.8,
+    warmup_bias_lr=0.1       
+)
+
+data_aug = dict(
+    hsv_h=0.015,  
+    hsv_s=0.7, 
+    hsv_v=0.4,
+    degrees=0.0,
+    translate=0.1,
+    scale=0.9,
+    shear=0.0,
+    flipud=0.0,
+    fliplr=0.5,
+    mosaic=1.0,
+    mixup=0.1,
+)
+
+training_mode = "conv_silu"
diff --git a/configs/mbla/yolov6l_mbla_finetune.py b/configs/mbla/yolov6l_mbla_finetune.py
@@ -0,0 +1,70 @@
+# YOLOv6l model
+model = dict(
+    type='YOLOv6l_mbla',
+    pretrained=None,
+    depth_multiple=0.5,  
+    width_multiple=1.0,
+    backbone=dict(
+        type='CSPBepBackbone',
+        num_repeats=[1, 4, 8, 8, 4],
+        out_channels=[64, 128, 256, 512, 1024],
+        csp_e=float(1)/2,
+        fuse_P2=True,
+        stage_block_type="MBLABlock",
+        ),
+    neck=dict(
+        type='CSPRepBiFPANNeck',
+        num_repeats=[8, 8, 8, 8],
+        out_channels=[256, 128, 128, 256, 256, 512],
+        csp_e=float(1)/2,
+        stage_block_type="MBLABlock",
+        ),
+    head=dict(
+        type='EffiDeHead',
+        in_channels=[128, 256, 512],
+        num_layers=3,
+        begin_indices=24,
+        anchors=3,
+        anchors_init=[[10,13, 19,19, 33,23], 
+                      [30,61, 59,59, 59,119], 
+                      [116,90, 185,185, 373,326]],
+        out_indices=[17, 20, 23],
+        strides=[8, 16, 32],
+        atss_warmup_epoch=0,
+        iou_type='giou',
+        use_dfl=True,
+        reg_max=16, #if use_dfl is False, please set reg_max to 0
+        distill_weight={
+            'class': 2.0,
+            'dfl': 1.0,
+        },
+    )
+)
+
+solver=dict(
+    optim='SGD',
+    lr_scheduler='Cosine',
+    lr0=0.0032,
+    lrf=0.12,
+    momentum=0.843,
+    weight_decay=0.00036,
+    warmup_epochs=2.0,
+    warmup_momentum=0.5,
+    warmup_bias_lr=0.05
+)
+
+data_aug = dict(
+    hsv_h=0.0138,
+    hsv_s=0.664,
+    hsv_v=0.464,
+    degrees=0.373,
+    translate=0.245,
+    scale=0.898,
+    shear=0.602,
+    flipud=0.00856,
+    fliplr=0.5,
+    mosaic=1.0,
+    mixup=0.243,
+)
+
+training_mode = "conv_silu"
diff --git a/configs/mbla/yolov6m_mbla.py b/configs/mbla/yolov6m_mbla.py
@@ -0,0 +1,70 @@
+# YOLOv6l model
+model = dict(
+    type='YOLOv6m_mbla',
+    pretrained=None,
+    depth_multiple=0.5,  
+    width_multiple=0.75,
+    backbone=dict(
+        type='CSPBepBackbone',
+        num_repeats=[1, 4, 8, 8, 4],
+        out_channels=[64, 128, 256, 512, 1024],
+        csp_e=float(1)/2,
+        fuse_P2=True,
+        stage_block_type="MBLABlock",
+        ),
+    neck=dict(
+        type='CSPRepBiFPANNeck',
+        num_repeats=[8, 8, 8, 8],
+        out_channels=[256, 128, 128, 256, 256, 512],
+        csp_e=float(1)/2,
+        stage_block_type="MBLABlock",
+        ),
+    head=dict(
+        type='EffiDeHead',
+        in_channels=[128, 256, 512],
+        num_layers=3,
+        begin_indices=24,
+        anchors=3,
+        anchors_init=[[10,13, 19,19, 33,23], 
+                      [30,61, 59,59, 59,119], 
+                      [116,90, 185,185, 373,326]],
+        out_indices=[17, 20, 23],
+        strides=[8, 16, 32],
+        atss_warmup_epoch=0,
+        iou_type='giou',
+        use_dfl=True,
+        reg_max=16, #if use_dfl is False, please set reg_max to 0
+        distill_weight={
+            'class': 2.0,
+            'dfl': 1.0,
+        },
+    )
+)
+
+solver=dict(
+    optim='SGD',
+    lr_scheduler='Cosine',
+    lr0=0.01,
+    lrf=0.01,
+    momentum=0.937,
+    weight_decay=0.0005,
+    warmup_epochs=3.0,
+    warmup_momentum=0.8,
+    warmup_bias_lr=0.1       
+)
+
+data_aug = dict(
+    hsv_h=0.015,  
+    hsv_s=0.7, 
+    hsv_v=0.4,
+    degrees=0.0,
+    translate=0.1,
+    scale=0.9,
+    shear=0.0,
+    flipud=0.0,
+    fliplr=0.5,
+    mosaic=1.0,
+    mixup=0.1,
+)
+
+training_mode = "conv_silu"