Keywords: Convolutional Neural Network, Global Receptive Field, Backbone
Abstract: Recent Convolutional Neural Networks (CNNs) utilize large-kernel convolutions (e.g., 101 kernel convolutions) to simulate a large receptive field of Vision Transformers (ViTs).
However, these models introduce specialized techniques like re-parameterization, sparsity, and weight decomposition, increasing the complexity of the training and inference stages.
To address this challenge, we propose Region-aware CNN (RaCNN), which achieves a global receptive field without requiring extra complexity, yet surpasses state-of-the-art models.
Specifically, we design two novel modules to capture global visual dependencies.
The first is the Region-aware Feed Forward Network (RaFFN).
It uses a novel Region Point-Wise Convolution (RPWConv) to capture global visual cues in a region-aware manner.
In contrast, traditional PWConv shares the same weights for all spatial pixels and cannot capture spatial information.
The second is the Region-aware Gated Linear Unit (RaGLU).
This channel mixer captures long-range visual dependencies in a sparse global manner and can become a better substitute for the original FFN.
Under only 84\% computational complexity, RaCNN significantly outperforms the state-of-the-art CNN model MogaNet (83.9\% vs. 83.4\%).
It also demonstrates good scalability and surpasses existing state-of-the-art lightweight models.
Furthermore, our RaCNN shows comparability with state-of-the-art ViTs, MLPs, and Mambas in object detection, instance segmentation, and semantic segmentation.
All codes and logs are released in the supplementary materials.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6438
Loading