ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron; Piotr Bojanowski; Mathilde Caron; Matthieu Cord; Alaaeldin El-Nouby; Edouard Grave; Gautier Izacard; Armand Joulin; Gabriel Synnaeve; Jakob Verbeek; Herve Jegou

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Herve Jegou

21 May 2021 (modified: 04 May 2025)NeurIPS 2021 SubmittedReaders: Everyone

Keywords: deep learning, neural networks, image classification, machine translation, transformer

Abstract: We present ResMLP an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We will share our code based on the Timm library and pre-trained models.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: Multilayer Perceptron with a simple design achieve surprisingly good results

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/resmlp-feedforward-networks-for-image/code)

12 Replies

Loading