Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

ICLR 2026 Conference Submission14123 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: neuroscience, working memory, recurrent neural networks, diffusion models, behavioral modeling

TL;DR: Training RNNs to reproduce realistic error patterns (rather than optimal performance) produces networks that better mimic biological neural computation, demonstrated through a working memory task where networks were taught to make swap errors.

Abstract: Discovering the neural mechanisms underpinning cognition is one of the grand challenges of neuroscience. Addressing this challenge greatly benefits from specific hypotheses about the underlying neural network dynamics. However, previous approaches bridging neural network dynamics and cognitive behaviour required iterative refinement of network architectures and/or objectives for normative task optimization, resulting in a long, and mostly heuristic, human-in-the-loop design process. Here, we offer an alternative approach that automates this process by explicitly training recurrent neural networks (RNNs) to reproduce behaviour, including the same characteristic errors, that humans and animals produce in a cognitive task. Achieving this required two main innovations. First, as the amount of behavioural data that can be collected in experiments is often too limited to suffice for training RNNs, we use a non-parametric generative model of behavioural responses to produce surrogate data for training RNNs. Second, to capture all relevant statistical moments in the data, rather than a limited number of hand-picked low-order moments as in previous moment matching-based approaches, we developed a novel diffusion model-based approach for training RNNs. We chose a visual working memory (VWM) task as our test-bed, as behaviour in this task is well known to produce response distributions that are patently multimodal (due to so-called swap errors). The resulting network dynamics correctly predicted previously reported qualitative features of neural data recorded in macaques. Importantly, this was only the case when RNNs were trained using our approach, fitting the full richness of behavioural data -- and not when only a limited set of behavioural signatures were fitted, nor when RNNs were trained for task optimality instead of reproducing behaviour (as has been typical for RNNs used to generate dynamical hypotheses). Our model also makes novel predictions about the mechanism of swap errors, which can be readily tested in experiments. These results suggest that fitting rich patterns of behaviour provides a powerful way for the automatic discovery of neural network dynamics supporting important cognitive functions.

Primary Area: applications to neuroscience & cognitive science

Submission Number: 14123

Loading