Keywords: Large Language Model, Reinforcement Learning, Self-Improvement
TL;DR: This paper presents RISE, a fine-tuning approach that enables language models to iteratively improve their own responses over multiple turns.
Abstract: A central piece in enabling intelligent agentic behavior in foundation models is to make them capable of introspecting upon their behavior, to reason and correct their mistakes. In this paper, we introduce $\textbf{RISE}$: $\textbf{R}$ecursive $\textbf{I}$ntro$\textbf{S}$p$\textbf{E}$ction, an approach for fine-tuning large language models (LLMs) to enable introspection and self-correction. $\textbf{RISE}$ prescribes an iterative fine-tuning procedure that teaches the model to alter its response after seeing previously unsuccessful attempts to solve a problem with additional environment feedback. Inspired by online imitation learning, we derive strategies for multi-turn data collection and training to imbue an LLM with the capability to recursively detect and correct its mistakes in subsequent iterations. Experiments show that $\textbf{RISE}$ enables 7B Llama2 and Mistral models to improve themselves with more turns on math reasoning tasks, outperforming single-turn strategies given equal inference-time computation, without disrupting one-turn abilities.
Submission Number: 11
Loading