Optimizing Language Model's Reasoning Abilities with Weak Supervision

ACL ARR 2024 June Submission1040 Authors

14 Jun 2024 (modified: 08 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex reasoning, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. In this work, we begin by analyzing the limitations of existing data-efficient reinforcement learning (RL) methods in LLMs' reasoning enhancement. To mitigate this, we introduce self-reinforcement, an efficient weak-to-strong approach to optimize language models' reasoning abilities utilizing both annotated and unlabeled samples. Our method enhances the quality of synthetic feedback by fully harnessing annotated seed data and introducing a novel self-filtering mechanism to remove invalid pairs. We also present \textsc{PuzzleBen}, a weakly supervised benchmark for reasoning that comprises 25,147 complex questions, answers, and human-generated rationales across various domains, such as brainteasers, puzzles, riddles, parajumbles, and critical reasoning tasks. Our experiments underscore the significance of \textsc{PuzzleBen}, as well as the effectiveness of our methodology as a promising direction in future endeavors. Our dataset and code will be published soon on \texttt{Anonymity Link}.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Reasoning, Weak Supervision, Weak-to-Strong, Self-Improve, Low-Resource
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: English
Submission Number: 1040
Loading