Parallelising Differentiable Algorithms Removes the Scalar Bottleneck: A Case Study

Published: 27 Jun 2024, Last Modified: 20 Aug 2024Differentiable Almost EverythingEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning, ICML, Differentiable Algorithms, Neural Algorithmic Reasoning, Parallelisation, Scalar Differentiable Algorithmic Black-Box, sDAB, Parallel Differentiable Algorithmic Black-Box, pDAB, NAR
TL;DR: We perform a case study of the 'scalar bottleneck' in algorithmic modules, and find that parallelising algorithmic modules can alleviate this bottleneck while preserving their efficiency and correctness guarantees.
Abstract: While differentiable algorithms are a popular way to imbue neural networks with an algorithmic inductive bias, it's been hypothesised that their performance is limited by the 'scalar bottleneck': the requirement that rich latent states be projected to scalars in order to be used as algorithmic inputs. This motivated the development of neural algorithmic processors (NAPs), neural networks that imitate algorithms in high-dimensional space, as a way to avoid this bottleneck while still retaining an algorithmic inductive bias. So far, however, there has been little work exploring whether this bottleneck exists in practice, and if so, the extent to which NAPs successfully mitigate it. Here, we present a case study of the scalar bottleneck on a new synthetic dataset inspired by recent work in neural algorithmics. While we found that the differentiable algorithm we tested did indeed suffer from a 'scalar bottleneck', we also found that this bottleneck was not alleviated by frozen NAPs, but rather by simply using an unfrozen, algorithmically-aligned neural network. Based on these results, we hypothesise that the problem might be better thought of as an 'ensembling bottleneck', caused by the inability to execute multiple instances of the same algorithm in parallel. We thus develop the parallel differentiable algorithmic black-box (pDAB), which preserves the efficiency and correctness guarantees of its scalar counterpart, while avoiding the scalar bottleneck.
Submission Number: 41
Loading