Skip to content

This repo is an implementation of PyTorch version YOLOV Series

License

Notifications You must be signed in to change notification settings

hassan-teymoury/YOLOV

 
 

Repository files navigation

YOLOV and YOLOV++ for video object detection.

Update

  • July. 30th, 2024: The pre-print version of the YOLOV++ paper is now available on Arxiv.

  • April. 21th, 2024: Our enhanced model now achieves a 92.9 AP50(w.o post-processing) on the ImageNet VID dataset, thanks to a more robust backbone and algorithm improvements. It maintains a processing time of 26.5ms per image during batch inference on a 3090 GPU. Code release is forthcoming.

  • May. 8th, 2024: We release code, log and weights for YOLOV++.

Introduction

PWC PWC

YOLOV series are high performance video object detector. Please refer to YOLOV and YOLOV++ on Arxiv for more details.

This repo is an implementation of PyTorch version YOLOV and YOLOV++ based on YOLOX.

YOLOX Pretain Models on ImageNet VID

Model size mAP@50val
Speed 2080Ti(batch size=1)
(ms)
Speed 3090(batch size=32)
(ms)
weights
YOLOX-s 576 69.5 9.4 1.4 google
YOLOX-l 576 76.1 14.8 4.2 google
YOLOX-x 576 77.8 20.4 - google
YOLOX-SwinTiny 576 79.2 19.0 5.5 google
YOLOX-SwinBase 576 86.5 24.9 11.8 google
YOLOX-FocalLarge 576 89.7 42.2 25.7 -

Main result in YOLOV++

Model size mAP@50val
Speed 3090(batch size=32)
(ms)
weights logs
YOLOV++ s 576 78.7 5.3 google link
YOLOV++ l 576 84.2 7.6 google -
YOLOV++ SwinTiny 576 85.6 8.4 google link
YOLOV++ SwinBase 576 90.7 15.9 google link
YOLOV++ FocalLarge 576 92.9 27.6 google link
YOLOV++ FocalLarge + Post 576 93.2 - -

Main result in YOLOV

Model size mAP@50val
Speed 2080Ti(batch size=1)
(ms)
weights
YOLOV-s 576 77.3 11.3 google
YOLOV-l 576 83.6 16.4 google
YOLOV-x 576 85.5 22.7 google
YOLOV-x + post 576 87.5 - -

TODO

  • Finish Swin-Transformer based experiments.
  • Release updated code, model and log.

Quick Start

Installation

Install YOLOV from source.

git clone [email protected]:YuHengsss/YOLOV.git
cd YOLOV

Create conda env.

conda create -n yolov python=3.7

conda activate yolov

pip install -r requirements.txt

pip3 install -v -e .
Demo

Step1. Download a pretrained weights.

Step2. Run yolov demos. For example:

python tools/vid_demo.py -f [path to your yolov exp files] -c [path to your yolov weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result 

For online mode, exampled with yolov_l, you can run:

python tools/yolov_demo_online.py -f ./exp/yolov/yolov_l_online.py -c [path to your weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result 

For yolox models, please use python tools/demo.py for inferencing.

Reproduce our results on VID

Step1. Download datasets and weights:

Download ILSVRC2015 DET and ILSVRC2015 VID dataset from IMAGENET and organise them as follows:

path to your datasets/ILSVRC2015/
path to your datasets/ILSVRC/

Download our COCO-style annotations for training, FGFA version training annotation and video sequences. Then, put them in these two directories:

YOLOV/annotations/vid_train_coco.json
YOLOV/annotations/ILSVRC_FGFA_COCO.json
YOLOV/yolox/data/dataset/train_seq.npy

Change the data_dir in exp files to [path to your datasets] and Download our weights.

Step2. Generate predictions and convert them to IMDB style for evaluation.

python tools/val_to_imdb.py -f exps/yolov/yolov_x.py -c path to your weights/yolov_x.pth --fp16 --output_dir ./yolov_x.pkl

Evaluation process:

python tools/REPPM.py --repp_cfg ./tools/yolo_repp_cfg.json --predictions_file ./yolov_x.pkl --evaluate --annotations_filename ./annotations/annotations_val_ILSVRC.txt --path_dataset [path to your dataset] --store_imdb --store_coco  (--post)

(--post) indicates involving post-processing method. Then you will get:

{'mAP_total': 0.8758871720817065, 'mAP_slow': 0.9059275666099181, 'mAP_medium': 0.8691557352372217, 'mAP_fast': 0.7459511040452989}

Training example

python tools/vid_train.py -f exps/yolov/yolov_s.py -c weights/yoloxs_vid.pth --fp16

Roughly testing

python tools/vid_eval.py -f exps/yolov/yolov_s.py -c weights/yolov_s.pth --tnum 500 --fp16

tnum indicates testing sequence number.

Annotation format

Details

Training base detector

The train_coco.json is a COCO format annotation file. When trainig the base detector on your own dataset, try to convert the annotation to COCO format.

Training YOLOV Series

The train_seq.npy and val_seq.npy files are numpy arrays of lists. They can be loaded using the following command:

numpy.load('./yolox/data/datasets/train_seq.npy',allow_pickle=True)

Each list contains the paths to all images in a video. The specific annotations(xml annotation in VID dataset) are loaded via these image paths, refer to https://github.com/YuHengsss/YOLOV/blob/f5a57ddea2f3660875d6d75fc5fa2ddbb95028a7/yolox/data/datasets/vid.py#L125 for more details.

Acknowledgements

Expand

Cite YOLOV and YOLOV++

If YOLOV series are helpful for your research, please cite the following paper:

@article{shi2024yolovpp,
      title={Practical Video Object Detection via Feature Selection and Aggregation}, 
      author={Yuheng Shi and Tong Zhang and Xiaojie Guo},
      journal={arXiv preprint arXiv:2407.19650},
      year={2024},
}

@article{shi2022yolov,
  title={YOLOV: Making Still Image Object Detectors Great at Video Object Detection},
  author={Shi, Yuheng and Wang, Naiyan and Guo, Xiaojie},
  journal={arXiv preprint arXiv:2208.09686},
  year={2022}
}

About

This repo is an implementation of PyTorch version YOLOV Series

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.8%
  • C++ 2.2%