Abstract: Answering questions about a text frequently requires aggregating information from multiple places in that text. End-to-end neural network models, the dominant approach in the current literature, can theoretically learn how to distill and manipulate representations of the text without explicit supervision about how to do so. We investigate a canonical architecture for this task, the memory network, and analyze how effective it really is in the context of three multi-hop reasoning settings. In a simple synthetic setting, the path-finding task of the bAbI dataset, the model fails to learn the correct reasoning without additional supervision of its attention mechanism. However, with this supervision, it can perform well. On a real text dataset, WikiHop, the memory network gives nearly state-of-the-art performance, but does so without using its multi-hop capabilities. A tougher anonymized version of the WikiHop dataset is qualitatively similar to bAbI: the model fails to perform well unless it has additional supervision. We hypothesize that many "multi-hop" architectures do not truly learn this reasoning as advertised, though they could learn this reasoning if appropriately supervised.
Keywords: NLP, Reading Comprehension, Memory Networks, Multi-hop Reasoning
TL;DR: Memory Networks do not learn multi-hop reasoning unless we supervise them.