The experiments use the Drone Vehicle and AVIID dataset. The Drone Vehicle dataset comprises RGB-Infrared images captured by drones for vehicle detection, covering various scenarios such as urban roads, residential areas, and parking lots, with both daytime and nighttime scenes. Each image pair, consisting of visible and infrared modalities, has a resolution of 640×512. From the original images, we selected 3721 pairs containing various components such as roads, vegetation, and vehicles for the training set, which cover diverse environments and object structures with clear information. The test set, consisting of 4000 pairs. All images were centrally cropped to a size of 512×512. The AVIID dataset consists of paired aerial visible and infrared images captured by dual-camera drones. Specifically, AVIID-3 contains 1280 pairs of visible-infrared images, each with a size of 480×480, characterized by a higher density and diversity of vehicles, varied target scales, and multiple elements. Consequently, we selected 647 pairs for the training set and 207 pairs for the test set, with all images directly resized to 256×256 without cropping.
Drone Vehicle
AVIID