Perceiver IO: A General Architecture for Structured Inputs & Outputs

Andrew Jaegle; Sebastian Borgeaud; Jean-Baptiste Alayrac; Carl Doersch; Catalin Ionescu; David Ding; Skanda Koppula; Daniel Zoran; Andrew Brock; Evan Shelhamer; Olivier J Henaff; Matthew Botvinick; Andrew Zisserman; Oriol Vinyals; Joao Carreira

Perceiver IO: A General Architecture for Structured Inputs & Outputs

Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier J Henaff, Matthew Botvinick, Andrew Zisserman, Oriol Vinyals, Joao Carreira

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SpotlightReaders: Everyone

Keywords: Perceiver, BERT, natural language processing, optical flow, computer vision, multimodal, GLUE, ImageNet, StarCraft

Abstract: A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.

One-sentence Summary: We propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/perceiver-io-a-general-architecture-for/code)

13 Replies

Loading