CARENET : A NOVEL ARCHITECTURE FOR LOW DATA REGIME MIXING CONVOLUTIONS AND ATTENTION

Aurélie Cools; Sidi Ahmed Mahmoudi; Mohammed Amin Belarbi

CARENET : A NOVEL ARCHITECTURE FOR LOW DATA REGIME MIXING CONVOLUTIONS AND ATTENTION

Aurélie Cools, Sidi Ahmed Mahmoudi, Mohammed Amin Belarbi

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: CNN, attention, low-data regime, classification

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: In the rapidly evolving landscape of deep learning for computer vision, var- ious architectures have been proposed to achieve state-of-the-art performance in tasks such as object recognition, image segmentation, and classification. While pretrained models on large datasets like ImageNet have been the corner- stone for transfer learning in many applications, this paper introduces CAReNet (Convolutional Attention Residual Network), a novel architecture that was trained from scratch, in the absence of available pretrained weights. CAReNet incorpo- rates a unique blend of convolutional layers, attention mechanisms, and residual connections to offer a holistic approach to feature extraction and representation learning. Notably, CAReNet closely follows the performance of ResNet50 on the same training set while utilizing fewer parameters. Training CAReNet from scratch proved to be necessary, particularly due to architectural differences that render feature representations incompatible with those from pretrained models. Furthermore, we highlight that training new models on large, general-purpose databases to obtain pretrained weights requires time, accurate labels, and pow- erful machines, which causes significant barriers in many domains. Therefore, the absence of pretrained weights for CAReNet is not only a constraint but also an op- portunity for architecture-specific optimization. We also emphasize that in certain domains, such as space and medical fields, the features learned from ImageNet images are vastly different and can introduce bias during training, given the gap that exists between the domains of pretraining and the task of transfer learning. This work focuses on the importance of architecture-specific training strategies for optimizing performance and also demonstrates the efficacy of CAReNet in achieving competitive results with a more compact model architecture. Experi- ments were carried out on several benchmark datasets, including Tiny ImageNet, for image classification tasks. Signifying a groundbreaking stride in efficiency and performance, CAReNet not only outpaces ResNet50 by achieving a lead of 2.61% on Tiny-Imagenet and 1.9% on STL10, but it does so with a model that’s nearly half the size of ResNet50. This impressive balance between compactness and elevated accuracy highlights the prowess of CAReNet in the realm of deep learning architectures.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7399

Loading