Improving Retrospective Language Agents via Joint Policy Gradient Optimization

Improving Retrospective Language Agents via Joint Policy Gradient Optimization

ACL ARR 2024 August Submission173 Authors

15 Aug 2024 (modified: 26 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In recent research advancements within the community, large language models (LLMs) have sparked great interest in creating autonomous agents. However, current prompt-based agents often heavily rely on large-scale LLMs. Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named \ourmodel, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents. Specifically, we develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning, and design an off-policy joint policy gradient optimization algorithm with imitation learning regularization to enhance the data efficiency and training stability in agent tasks. \ourmodel significantly improves the performance of open-source models, reduces dependency on closed-source LLMs, and enables fine-tuned agents to learn and evolve continuously. We conduct extensive experiments across various testing environments, demonstrating \ourmodel has substantial improvements in task performance and decision-making processes. To benefit the research community, we have released our data and code at \url{https://anonymous.4open.science/r/RetroAct-04E8}.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: applications; fine-tuning

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 173

Loading