The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

Róbert Csordás; Kazuki Irie; Jürgen Schmidhuber

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 PosterReaders: Everyone

Keywords: transformer, compositionality, systematic generalization, algorithmic reasoning, arithmetic

Abstract: Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depths. NDR’s attention and gating patterns tend to be interpretable as an intuitive form of neural routing

One-sentence Summary: We improve systematic generalization of Transformers on algorithmic tasks by introducing a novel attention mechanism and gating.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/the-neural-data-router-adaptive-control-flow/code)

16 Replies

Loading