Segmentation using efficient residual networks with attention-fusion modules

ICLR 2025 Conference Submission2102 Authors

20 Sept 2024 (modified: 26 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Segmentation, Attention mechanisms, Efficient residual networks
TL;DR: Unique efficient residual network with attention mechanisms and fusion networks are developed to overcome the increasing computational cost of fusing semantic information from global and local contexts of segmentation networks.
Abstract: Fusing global and local semantic information in segmentation networks remains challenging due to computational costs and the need for effective long-range recognition. Based on the recent success of transformers and attention mechanisms, this research applies attention-based methods of attention-boosting modules and attention-fusion networks in enhancing the performance of state-of-the-art segmentation networks, such as InternImage and SERNet-Former, addressing these challenges. Integrating attention-boosting modules into residual networks generates baseline architectures like Efficient-ResNet, enabling them to extract global context feature maps in the encoder while minimizing computational costs. Attention-based algorithms can also be applied to networks utilizing vision transformers and convolutional layers, such as InternImage, to improve the existing results of state-of-the-art networks. In this research, SERNet-Former is deployed on the challenging benchmarking datasets such as ADE20K, BDD100K, CamVid, and Cityscapes by depending on the attention-based methods with new implementations of the network, SERNet-Former v2. Our methods have also been implemented for InternImage-XL and improved the test performance of the network on the Cityscapes dataset (85.1 % mean IoU). Respectively, the results of the selected networks developed by our methods on the challenging benchmarking datasets are found worth considering: 85.1 % mean IoU on the Cityscapes test dataset, 59.35 % mean IoU on ADE20K validation dataset, 67.42 % mean IoU on BDD100K validation dataset, and 84.62 % mean IoU on the CamVid dataset.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2102
Loading