DNCs require more planning steps

Yara Shamshoum; Yuval Sieradzki; Nitzan Hodos; Assaf Schuster

DNCs require more planning steps

Yara Shamshoum, Yuval Sieradzki, Nitzan Hodos, Assaf Schuster

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Deep learning, machine learning, neural turing machines, dnc, adaptive computation time

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We show that training a DNC with a number of planning steps that scales with input size substantialy improves generalization performance compared with the standard approach, which only uses a constant number of planning steps.

Abstract: A Differentiable Neural Computer (DNC) is a memory-augmented neural computation model capable of learning to solve complex algorithmic tasks, from simple sorting algorithms, through graph problems, to text question answering. In previous works, it was always given a constant number of planning steps to complete its task. In this work, we argue that the number of planning steps the model is allowed to take, which we call "planning budget", is a constraint that can cause the model to generalize poorly and hurt its ability to fully utilize its external memory. By introducing an adaptive planning budget that scales with input size during training, the model is better able to utilize its memory space, and achieves substantially better accuracy on input sizes not seen during training. We experiment with Graph Shortest Path search, which has been used as a benchmark to measure these models in the past, and with the Graph MinCut problem. In both problems, our proposed approach improves performance and generalizes better compared to the standard planning budget.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7632

Loading