RaCNN: Region-aware Convolutional Neural Network with Global Receptive Field

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Convolutional Neural Network, Global Receptive Field, Backbone
Abstract: Recent Convolutional Neural Networks (CNNs) utilize large-kernel convolutions (e.g., 101 kernel convolutions) to simulate a large receptive field of Vision Transformers (ViTs). However, these models introduce specialized techniques like re-parameterization, sparsity, and weight decomposition, increasing the complexity of the training and inference stages. To address this challenge, we propose Region-aware CNN (RaCNN), which achieves a global receptive field without requiring extra complexity, yet surpasses state-of-the-art models. Specifically, we design two novel modules to capture global visual dependencies. The first is the Region-aware Feed Forward Network (RaFFN). It uses a novel Region Point-Wise Convolution (RPWConv) to capture global visual cues in a region-aware manner. In contrast, traditional PWConv shares the same weights for all spatial pixels and cannot capture spatial information. The second is the Region-aware Gated Linear Unit (RaGLU). This channel mixer captures long-range visual dependencies in a sparse global manner and can become a better substitute for the original FFN. Under only 84\% computational complexity, RaCNN significantly outperforms the state-of-the-art CNN model MogaNet (83.9\% vs. 83.4\%). It also demonstrates good scalability and surpasses existing state-of-the-art lightweight models. Furthermore, our RaCNN shows comparability with state-of-the-art ViTs, MLPs, and Mambas in object detection, instance segmentation, and semantic segmentation. All codes and logs are released in the supplementary materials.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6438
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview