Are Language Models Better at Generating Answers or Validating Solutions?Download PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Recently, large language models (LLMs) have demonstrated remarkable reasoning abilities, augmented by recent advances in prompting techniques and reasoning frameworks. Many popular frameworks \cite{du_improving_2023, yao_tree_2023, shinn_reflexion_2023} rely on the assumption that models are able to give effective feedback on their own generations. This feedback is partly predicated on being able to correctly validate, or classify, the generated prediction as either correctly or incorrectly solving the given problem. While in traditional computer science settings validation has been shown to be as difficult as correct generation, we find empirically that language models may be better discriminators than generators. Our work studies whether leading language models are better at solving problems or validating solutions, and we attempt to gain a better understanding of why this happens. We quantify this by measuring the understanding gap --- the difference between generative and discriminative accuracy. First, we further corroborate recent work \cite{west_generative_2024} showing surprisingly that models are better generators than discriminators on some datasets. Second, we discover that understanding gaps can be closed or significantly narrowed through prompting and provide an estimate of the upper bound $\epsilon$ on the understanding gap across datasets. Third, we apply our findings to predict the settings where self-correction is most effective. This continues the conversation started by \cite{huang_large_2023}, where we instead show that LLMs can self-correct reasoning, and establish a link between a feature of the dataset and the language model's ability to self-correct.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview