LGSNet: Local-Global Semantics Learning Object Detection

Yang Li, Licheng Jiao, Xu Liu, Fang Liu, Lingling Li, Puhua Chen

Published: 2025, Last Modified: 25 Mar 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Self-attention learns capturing the long-range dependencies between embeddings (e.g., image pixels). However, the memory overhead and computation cost are prohibitive due to being quadratic in term of the spatial resolution. The structure analysis reveals two crucial roles in the attention: the correlation-based dependency structure and feature normalization. In this work, an efficacious Local-Global Semantics (LGS) module is proposed to alleviate the above issues by modeling the local semantic aggregation and global semantic interaction. Our LGS module contains a group convolution and an Efficient Global Semantic Attention (EGSA). Firstly, the group convolution aggregates local semantics. Secondly, considering a feature map as a sequence of 2-D channel representations, EGSA formulates a general model for the global semantic interaction. The linear correlation is computed between global semantics. LGS has the linear memory overhead and computation cost in term of the spatial resolution. The LGS module can be smoothly incorporated into object detection frameworks. The experiment results verify its effectiveness on two popular detection datasets: the MS COCO and PASCAL VOC.
Loading