Reasoning in Transformers - Mitigating Spurious Correlations and Reasoning Shortcuts

Daniel Enström; Viktor Kjellberg; Moa Johansson

Reasoning in Transformers - Mitigating Spurious Correlations and Reasoning Shortcuts

Daniel Enström, Viktor Kjellberg, Moa Johansson

Published: 01 Jan 2024, Last Modified: 20 May 2025NeSy (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Transformer language models are used for a wide variety of tasks, including some that also require logical reasoning. However, a transformer model may easily learn spurious patterns in the data, short-circuiting actual reasoning. We investigate to what extent transformers can be trained to a) approximate reasoning in propositional logic while b) avoiding known reasoning shortcuts via spurious correlations in the training data. To do so, we use a dataset with known spurious correlation between truth and e.g. the number of rules in the problem. We augment the data with proofs, and train two models based on generative transformers: WP-BART, trained to generate whole proofs at once, and a neuro-symbolic model, SIP-BART, trained to generate individual proof steps in combination with a symbolic proof checker. We find that SIP-BART succeeds in avoiding reasoning shortcuts, while WP-BART does not. For SIP-BART, we then identify a few remaining errors, arising from using a pre-trained language model. These are qualitatively analysed to create a taxonomy of four different types of additional pitfalls.

Loading