TL;DR: Enabling faster large language model solutions for autonomous reasoning and planning
Abstract: Large language models (LLMs) are making inroads into classical AI problems such as automated planning, yet key shortcomings continue to hamper their integration. Chain-of-Thought (CoT) struggles in complex multi-step reasoning, and Tree-of-Thoughts requires multiple queries that increase computational overhead. Recently, Algorithm-of-Thoughts (AoT) have shown promise using in-context examples, at the cost of significantly longer solutions compared to CoT. Aimed at bridging the solution length gap between CoT and AoT, this paper introduces AoT-O3, which combines supervised finetuning on AoT-style plans with a reinforcement learning (RL) framework designed to reduce solution length. The RL component uses a reward model that favors concise, valid solutions while maintaining planning accuracy. Empirical evaluations indicate that AoT-O3 shortens solution length by up to 80\% compared to baseline AoT while maintaining or surpassing prior performance. These findings suggest a promising pathway for more efficient, scalable LLM-based planning.
Lay Summary: Large language models (LLMs) can solve complex problems better when they are guided in smarter ways. The paper introduces a new method called AoT-O3 that helps these models plan more efficiently by giving rewards for shorter, accurate solutions. This approach significantly cuts down on the steps needed to reach a solution—by up to 80\%—without sacrificing quality. As a result, it also reduces energy use and makes AI more scalable and environmentally friendly.
Primary Area: Deep Learning->Large Language Models
Keywords: large language models, decision-making, planning
Submission Number: 15998
Loading