Adaptive Efficient Cross-Scale Object Detection

Zhiqi Shao

Published: 08 Feb 2026, Last Modified: 05 May 2026OpenReview Archive Direct UploadEveryoneCC BY-NC 4.0

Abstract: Modern YOLO detectors rely on efficient convolutional designs and increasingly sophisticated attention mechanisms, but most existing architectures still aggregate information locally or model relationships in a pairwise manner. Such designs are often insufficient for complex object detection scenarios that involve occlusion, dense layouts, and large scale variation. We propose an efficient real-time detection framework that introduces adaptive hypergraph reasoning into visual feature learning. Instead of treating spatial positions or feature levels as isolated pairwise interactions, the proposed module constructs latent high-order correlations and performs global cross-location and cross-scale feature enhancement through lightweight hypergraph computation. To maximize the utility of these enhanced features, we develop a network-wide aggregation-and-distribution pathway that injects high-order contextual cues into multiple stages of the detector, improving representation synergy across the full pipeline. The architecture is further optimized with depthwise separable convolution blocks, which replace conventional large-kernel convolutions to lower computational cost. Extensive evaluation on MS COCO demonstrates that the proposed method achieves competitive state-of-the-art accuracy with fewer parameters and reduced FLOPs compared with YOLOv11-N, YOLO12-N and YOLOv13-N. These findings suggest that hypergraph-based correlation modeling is a practical direction for building accurate, compact, and real-time object detectors.