Keywords: Self-Refinement, Large language models (LLMs),Iterative Preference Training, Self-Improvement
TL;DR: We present EVOLVE, a novel framework designed to evolve the self-refinement capability of large language models (LLMs), which in turn facilitates iterative preference optimization for further improvement of their alignment performance.
Abstract: Self-Refinement refers to a model's ability to revise its own responses to produce improved outputs. This capability can also serve as a fundamental mechanism for Self-Improvement, for example by reconstructing datasets with refined results to enhance intrinsic model performance.
However, our comprehensive experiments reveal that large language models (LLMs) show no clear evidence of inherent Self-Refinement; on average, response quality degrades over successive iterations. To address this gap, we propose EVOLVE, a simple yet effective framework for eliciting and tracking the evolution of Self-Refinement through iterative training. Moreover, we demonstrate the potential of leveraging Self-Refinement to achieve broader Self-Improvement of intrinsic model abilities.
Experiments show that the evolved Self-Refinement ability enables the Llama-3.1-8B base model to surpass GPT-4o, achieving 62.3% length-controlled and 63.3% raw win rates on AlpacaEval 2, and 50.3% on Arena-Hard. It also generalizes effectively to out-of-domain reasoning tasks, improving performance on mathematical reasoning benchmarks such as GSM8K and MATH.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 5487
Loading