Abstract: We introduce a parallelizable simplification of Neural Turing Machine (NTM), referred to as P-NTM, which reformulates the core operations of the original architecture as associative operators, enabling the use of efficient parallel scan algorithms. We additionally develop a log-space parallel algorithm for the numerically stable computation of these operations over long sequences. We evaluate the proposed architecture on a synthetic benchmark of algorithmic problems involving state tracking, memorization, and basic arithmetic, solved via autoregressive decoding. We compare P-NTM against a revisited stable implementation of the standard NTM, as well as conventional recurrent and attention-based architectures. Results show that, despite its simplifications, the proposed architecture matches the original in generalization on all evaluated tasks, solving all problems with perfect accuracy, including at unseen sequence lengths. We argue that this is achieved by replacing the recurrent controller with autoregressive control through output tokens. It also exhibits superior training efficiency, with parallel execution being up to an order of magnitude faster than the standard NTM. Ultimately, this work contributes toward the development of efficient neural architectures capable of expressing a broader class of algorithms.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Alessandro_Sperduti1
Submission Number: 8177
Loading