Altitude-informed fusion pyramid network for multi-scale waste detection in unmanned aerial vehicle images

Chan Yue Liew, Joanne Mun-Yee Lim, Chee Pin Tan, Raja Mazhar Mohar Bin Tun Mohar

Published: 01 Aug 2025, Last Modified: 09 Nov 2025Engineering Applications of Artificial IntelligenceEveryoneRevisionsCC BY-SA 4.0
Abstract: Accurate waste detection using unmanned aerial vehicles (UAVs) remains a significant challenge due to the non-canonical perspective, varying altitudes, and diverse nature of waste materials such as plastic, with its translucency and irregular appearance. Traditional object detection architectures struggle to adapt to these factors, leading to reduced detection accuracy. To address these challenges, we propose the use of additional contextual information by incorporating altitude information of UAV waste images to enhance the multi-scale detection capabilities of feature pyramid networks, dynamically assigning differential importance to feature fusion modules. We propose a novel altitude-informed tiny-object detection architecture (named AltiDet) in three configurations of backbone sizes, consisting of an Asymmetric Deep Aggregation (ADA) and High-Resolution Feature Extraction (HRFE) module as a backbone, an Altitude-Informed Fusion Pyramid Network (A-IFPN) and an altitude-scaled loss function. Our ADA+HRFE backbone aggregates extracted features iteratively using different nodes while capturing additional spatial and contextual features from the high-resolution layers, to account for the fine-grained and irregular features of waste. The A-IFPN fuses and aggregates Altitude-Weighted Frequency Attention (AWFA) modules to further extract meaningful feature maps from the backbone, improving multi-scale detection. Our altitude-scaled loss function ensures the more challenging higher altitude images are greater emphasized during training. Extensive experiments conducted on our collected multi-class aerial waste dataset (named AltiWaste), the Solid Waste Aerial Detection (SWAD), Trash Annotations in Context (TACO) and other datasets demonstrate the effectiveness and advantages of our proposed architecture, yielding a 1.41% improvement over state-of-the art detectors on the AltiWaste dataset.
Loading