Towards Overcoming Reasoning Shortcuts in Neurosymbolic Learning via Efficient Generative Proxies

TMLR Paper7606 Authors

20 Feb 2026 (modified: 15 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Symbol grounding, the task of linking high-dimensional sensory inputs to symbolic representations in neurosymbolic AI (NeSy), often suffers from reasoning shortcuts, where inputs are mapped to unintended concepts due to limited supervision. Reconstruction-based training can help mitigate these ambiguities, but its effectiveness depends strongly on the quality and capacity of the reconstruction model. In this work, we propose a new grounding framework, Efficient Generative Proxies (EGP), that cleanly integrates reconstruction-based training into a generative modeling perspective. EGP subsumes several existing grounding approaches as special cases. We further argue that the role of reconstruction should be to capture the underlying structure of the data rather than to faithfully reconstruct inputs. Accordingly, we design a reconstruction term that leverages the principle that similar inputs should correspond to similar concept labels, thereby substantially reducing grounding ambiguity. We also develop extensions that incorporate additional inductive biases through this reconstruction term, improving robustness in more complex tasks. We evaluate our approach on tasks susceptible to reasoning shortcuts from the RSbench benchmark, as well as on the multi-concept ObjectMath dataset, integrating EGP into state-of-the-art neurosymbolic learning frameworks. Experimental results demonstrate that EGP significantly improves grounding accuracy and effectively mitigates reasoning shortcuts across diverse settings.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Francesco_Locatello1
Submission Number: 7606
Loading