TL;DR: We propose a novel LLM-based Actor-Critic framework that enhances LLMs' decision-making through long-term action evaluations and efficient policy improvements
Abstract: Large Language Models (LLMs) have achieved remarkable advancements in natural language processing tasks, yet they encounter challenges in complex decision-making scenarios that require long-term reasoning and alignment with high-level objectives. Existing methods either rely on short-term auto-regressive action generation or face limitations in accurately simulating rollouts and assessing outcomes, leading to sub-optimal decisions. This paper introduces a novel LLM-based Actor-Critic framework, termed LAC, that effectively improves LLM policies with long-term action evaluations in a principled and scalable way. Our approach addresses two key challenges: (1) extracting robust action evaluations by computing Q-values via token logits associated with positive/negative outcomes, enhanced by future trajectory rollouts and reasoning; and (2) enabling efficient policy improvement through a gradient-free mechanism. Experiments across diverse environments -- including high-level decision-making (ALFWorld), low-level action spaces (BabyAI-Text), and large action spaces (WebShop) -- demonstrate the framework’s generality and superiority over state-of-the-art methods. Notably, our approach achieves competitive performance using 7B/8B parameter LLMs, even outperforming baseline methods employing GPT-4 in complex tasks. These results underscore the potential of integrating structured policy optimization with LLMs’ intrinsic knowledge to advance decision-making capabilities in multi-step environments.
Lay Summary: Large Language Models (LLMs) like ChatGPT have shown impressive abilities in generating text and answering questions. But when it comes to making complex decisions — especially those that require long-term planning — they often fall short. Current approaches either rely too much on the LLM’s initial instincts, or on simulated trial-and-error strategies that can be inaccurate or misleading.
We developed a new method, called an “LLM-based Actor-Critic,” that helps LLMs make better decisions by combining their built-in knowledge with smarter planning. First, our method evaluates possible actions by analyzing the LLM's internal confidence signals — essentially asking, “Does the model believe this action will succeed?” We then use this information to refine the model’s decision-making, but in a way that avoids the expensive and slow process of traditional learning.
This approach works across many different types of tasks, from high-level planning to detailed step-by-step actions, and it outperforms existing state-of-the-art methods — even when using smaller models.
Link To Code: http://github.com/drdh/LAC
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Models, Decision-Making, Actor-Critic
Submission Number: 8883
Loading