Keywords: Language model, Logic, Compositionality
TL;DR: We investigated the limitation of compositional reasoning ability of reasoning model
Abstract: Recent language models appear to solve complex tasks that require logical reasoning with its rationale through a large number of parameters and instruction tuning. While step-by-step explanations have been introduced to improve the accuracy of the final prediction of the language models, there is still a lack of research on the reliability of the rationale. Therefore, the paper includes a study on the compositional reasoning ability of language models, as well as an analysis of the logical proof generated by them. By employing clear and straightforward semantics and syntax of a boolean expression, we observed and analyzed how language models generalize and solve the boolean formula. We classified boolean expressions based on their depth and empirically observed that language models not only struggle to comprehend more complex boolean expressions but also that their rationale is unreliable in affirming that the language models truly understand and solve the problem. From the perspective of understanding the structure of boolean algebra expressions, we discovered that language models inherently fail to generalize the compositional structure and often fail not only in calculating formulas but in grasping the input structure itself.
Supplementary Material:  zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2521
Loading