The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Peter Kocsis; Peter Súkeník; Guillem Braso; Matthias Nießner; Laura Leal-Taixé; Ismail Elezi

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Peter Kocsis, Peter Súkeník, Guillem Braso, Matthias Nießner, Laura Leal-Taixé, Ismail Elezi

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: Low-data Regime, Convolutional Networks

Abstract: Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.

TL;DR: Using final fully-connected layers helps the generalization of convolutional networks in low-data regimes.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/the-unreasonable-effectiveness-of-fully/code)

17 Replies

Loading