Continual Learning and Out of Distribution Generalization in a Systematic Reasoning Task

Published: 28 Oct 2023, Last Modified: 16 Nov 2023MATH-AI 23 PosterEveryoneRevisionsBibTeX
Keywords: continual learning, deep neural networks, transformers, systematic reasoning, out of distribution generalization, abstract reasoning, games
TL;DR: We investigate how well small-scale transformer models learn abstract strategies in Sudoku games and how to improve this generalization via a variety of technqiues
Abstract: Humans often learn new problem solving strategies from a narrow range of examples and generalize to examples out of the distribution (OOD) used in learning, but such generalization remains a challenge for neural networks. This impacts learning mathematical techniques, which can apply to unbounded problem spaces (e.g. all real numbers). We explore this limitation using neural networks trained on strategies for solving specified cells in $6\times6$ Sudoku puzzles using a novel curriculum, where models first learn two preliminary tasks, then we assess OOD generalization during training on a subset of the set of possible training examples of a more complex solution strategy. Baseline models master the training distribution, but fail to generalize OOD. However, we introduce a combination of extensions that is sufficient to support highly accurate and reliable OOD generalization. These results suggest directions for improving the robustness of models trained with the highly imbalanced data distributions in natural data sets.
Submission Number: 54