Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025)
-
Datasets are available at
ITSDT-15K
andIRDST
(code: cctd). Or you can downloadIRDST
directly from the website. -
You need to reorganize these datasets in a format similar to the
coco_train_ITSDT.txt
andcoco_val_ITSDT.txt
files we provided (.txt files
are used in training). We provide the.txt files
for ITSDT-15K and IRDST. For example:
train_annotation_path = '/home/ITSDT-15K/coco_train_ITSDT.txt'
val_annotation_path = '/home/ITSDT-15K/coco_val_ITSDT.txt'
- Or you can generate a new
txt file
based on the path of your datasets..txt files
(e.g.,coco_train_ITSDT.txt
) can be generated from.json files
(e.g.,instances_train2017.json
). We also provide all.json files
forITSDT-15K
andIRDST
(code: cctd).
python utils_coco/coco_to_txt.py
- The folder structure should look like this:
ITSDT-15K
├─instances_train2017.json
├─instances_test2017.json
├─coco_train_ITSDT.txt
├─coco_val_ITSDT.txt
├─images
│ ├─1
│ │ ├─0.bmp
│ │ ├─1.bmp
│ │ ├─2.bmp
│ │ ├─ ...
│ ├─2
│ │ ├─0.bmp
│ │ ├─1.bmp
│ │ ├─2.bmp
│ │ ├─ ...
│ ├─3
│ │ ├─ ...
- python==3.11.8
- pytorch==2.1.1
- torchvision==0.16.1
- numpy==1.26.4
- opencv-python==4.9.0.80
- scipy==1.13
- Tested on Ubuntu 20.04, with CUDA 11.8, and 1x NVIDIA 3090.
-
We provide encoded language description embedding representations(code: xbet) of
ITSDT-15K
andIRDST
datasets. There are two embedded representations in this file:emb_train_IRDST.pkl
andemb_train_IRDST.pkl
. -
We also provide initial language description text files(code: bn38) that you can explore further with vision-language models.
-
Take the ITSDT-15K dataset as an example, modify the path of the
dataloader_for_ITSDT
for language description embedding representations:
# Path to your emb_train_ITSDT.pkl
description = pickle.load(open('/home/MoPKL/emb_train_ITSDT.pkl', 'rb'))
- Note: Please use different
dataloader
for different datasets. For example, to train the model on ITSDT dataset, enter the following command:
CUDA_VISIBLE_DEVICES=0 python train_ITSDT.py
- Usually
model_best.pth
is not necessarily the best model. The best model may have a lower val_loss or a higher AP50 during verification.
"model_path": '/home/MoPKL/logs/model.pth'
- You need to change the path of the
json file
of test sets. For example:
# Use ITSDT-15K dataset for test
cocoGt_path = '/home/public/ITSDT-15K/instances_test2017.json'
dataset_img_path = '/home/public/ITSDT-15K/'
python test.py
- We support
video
andsingle-frame image
prediction.
# mode = "video" (predict a sequence)
mode = "predict" # Predict a single-frame image
python predict.py
- For bounding box detection, we use COCO's evaluation metrics:
Method | Dataset | mAP50 (%) | Precision (%) | Recall (%) | F1 (%) | Download |
---|---|---|---|---|---|---|
MoPKL | ITSDT-15K | 79.78 | 93.29 | 86.80 | 89.92 |
Baidu (code: pchd)
|
MoPKL | IRDST | 74.54 | 89.04 | 84.74 | 86.84 |
- PR curves on ITSDT-15K and IRDST datasets in this paper.
- We provide the results (code: 4ves) on
ITSDT-15K
andIRDST
, and you can plot them using Python and matplotlib.
If any questions, kindly contact with Shengjia Chen via e-mail: [email protected].
- S. Chen, L. Ji, J. Zhu, M. Ye and X. Yao, "SSTNet: Sliced Spatio-Temporal Network With Cross-Slice ConvLSTM for Moving Infrared Dim-Small Target Detection," in IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-12, 2024, Art no. 5000912, doi: 10.1109/TGRS.2024.3350024.
- Ruigang Fu, Hongqi Fan, Yongfeng Zhu, et al. A dataset for infrared time-sensitive target detection and tracking for air-ground application[DS/OL]. V2. Science Data Bank, 2022[2024-12-10]. https://cstr.cn/31253.11.sciencedb.j00001.00331. CSTR:31253.11.sciencedb.j00001.00331.
If you find this repo useful, please cite our paper.