Impact of Mixed Multimodalities and Size Dependence on Performance of Object Detection on Multimodal Satellite Imagery

Nikita Gordienko, Yuri G. Gordienko, Oleksandr Rokovyi, Oleg Alienin, Andrii Polukhin, Sergii G. Stirenko

Published: 2023, Last Modified: 06 Mar 2025IEEE Big Data 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning (DL) workflow is presented for object detection (OD) on multimodal satellite imagery from the modified Vehicle Detection in Aerial Imagery (VEDAI) dataset. This study investigates object detection in aerial imagery using three modalities - RGB, IR, and RGB+IR fusion - across several different image sizes. The evaluation considers mean average precision (mAP) at different IoU thresholds. The study investigates the impact of input image size and modality on OD performance. A variety of DL models are trained and evaluated through cross-validation, covering different combinations of input modalities and sizes. Results reveal varying performance per class, modality, and image size. To maximize overall performance, a hybrid approach is proposed, combining predictions from different modalities, image sizes, and models. Combining predictions from multiple models yields an overall mAP improvement from 4.5 up to 19.8%. This data-driven approach provides insights for optimizing OD in many other applications of multimodal imagery.