mAPm: multi-scale Attention Pyramid module for Enhanced scale-variation in RLD detection

Published: 01 Jan 2024, Last Modified: 06 Mar 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Detecting objects across varying scales is still a challenge in computer vision, particularly in agricultural applications like Rice Leaf Disease (RLD) detection, where objects exhibit significant scale variations (SV). Conventional object detection (OD) like Faster R-CNN, SSD, and YOLO methods often fail to effectively address SV, leading to reduced accuracy and missed detections. To tackle this, we propose SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion), a novel module designed to enhance multi-scale object detection. SaRPFF integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability by mitigating artifacts within MHSA. Additionally, it integrates efficient attention atrous convolutions into the pyramid feature fusion and introduce a deconvolutional layer for refined up-sampling. We evaluate SaRPFF on YOLOv7 using the MRLD and COCO datasets. Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7. Furthermore, SaRPFF outperforms other FPN variants, including BiFPN, NAS-FPN, and PANET, showcasing its versatility and potential to advance OD techniques. This study highlights SaRPFF effectiveness in addressing SV challenges and its adaptability across FPN-based OD models.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview