Multi-modal Aerial View Image Challenge: Sensor Domain Translation

Spencer Low, Oliver Nina, Dylan Bowald, Angel Domingo Sappa, Nathan Inkawhich, Peter Bruns

Published: 01 Jan 2024, Last Modified: 12 Nov 2024CVPR Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper describes the design, outcomes, and top methods of the 2nd annual Multi-modal Aerial View Image Challenge (MAVIC) aimed at cross modality aerial image translation. The primary objective of this competition is to stimulate research efforts towards the development of models capable of translating co-aligned images between multiple modalities. Specifically, the challenge centers on translation between synthetic aperture radar (SAR), electro-optical (EO), camera (RGB), and infrared (IR) sensor modalities, a budding area of research that has begun to garner attention. While last year’s inaugural challenge demonstrated the feasibility of SAR→EO translation, this year’s challenge made significant improvements in dataset coverage, sensor variation, experimental design, and methods covering the tasks of SAR→EO, SAR→RGB, SAR→IR, RGB→IR introducing a new dataset called translation. By Multi-modal Aerial Gathered Image Composites (MAGIC); multimodal image translation is available for different comparisons. With a more rigorous set of translation performance metrics, winners were determined from aggregation of L1-norm, LPIPS (Learned Perceptual Image Patch Similarity, and FID (Frechet Inception Distance) scores. The wining methods included the pix2pixHD and LPIPS metrics as loss functions with an aggregated score 5% better separated by the SAR→EO and RGB→IR translation scores.