Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Large Language Models, Reasoning, Alignment
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We find an assessment misalignment problem of vanilla fine-tuned large language models in reasoning tasks and we propose a alignment fine-tuning paradigm with a novel constrained alignment loss to alleviate this problem.
Abstract: Reasoning is a cognitive process of using evidence to reach a sound conclusion.
The reasoning capability is essential for large language models (LLMs) to serve as the brain of the artificial general intelligence agent.
Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities.
However, we find that the fine-tuned LLMs suffer from an \textit{Assessment Misalignment} problem, i.e., they frequently assign higher scores to subpar COTs, leading to potential limitations in their reasoning abilities.
In this paper, we introduce an \textit{Alignment Fine-Tuning (AFT)} paradigm with a novel \textit{Constrained Alignment Loss} to alleviate the assessment misalignment problem.
Specifically, the proposed loss has two objectives:
a) Alignment, which guarantees the scores of high-quality COTs surpass that of subpar ones;
b) Constraint, which keeps the subpar scores confined to a reasonable range to prevent the model degradation.
Extensive experiments on four reasoning benchmarks with both binary and ranking feedback demonstrate the effectiveness of AFT.
AFT also performs well in multi-task and out-of-distribution situations.
Furthermore, we also delve deeply into recent ranking-based alignment methods, such as DPO, RRHF, and PRO, and discover that the constraint, which has been overlooked by these approaches, is also crucial for their performance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7016
Loading