Abstract: Real-world human decision-making often relies on strategic planning, where *high-level* goals guide the formulation of sub-goals and subsequent actions, as evidenced by domains such as healthcare, business, and urban policy. Despite notable successes in controlled settings, conventional reinforcement learning (RL) follows a *bottom-up* paradigm, which can struggle to adapt to real-world complexities such as sparse rewards and limited exploration budgets. While methods like hierarchical RL and environment shaping provide partial solutions, they frequently rely on either ad-hoc designs (e.g. choose the set of high-level actions) or purely data-driven discovery of high-level actions that still requires significant exploration. In this paper, we introduce a *top-down* framework for RL that explicitly leverages *human-like strategy* to reduce sample complexity, guide exploration, and enable high-level decision-making. We first formalize the *Strategy Problem*, which frames policy generation as finding distributions over policies that balance *specificity* and *value*. Building on this definition, we propose the *Strategist* agent—an iterative framework that leverages large language models (LLMs) to synthesize domain knowledge into a structured representation of actionable strategies and sub-goals. We further develop a *reward shaping methodology* that translates these strategies expressed in natural language into quantitative feedback for RL methods. Empirically, we demonstrate a significantly faster convergence than conventional PPO. Taken together, our findings highlight that *top-down strategic exploration* opens new avenues for enhancing RL on real-world decision problems.
Lay Summary: Traditional artificial intelligence learns through extensive trial and error, like a child randomly pressing buttons until something works. This approach is inefficient and impractical for real-world applications like healthcare or business, where mistakes are costly and time is limited. Humans, by contrast, start with high-level strategies—like a doctor creating a treatment plan before choosing specific medications.
We developed a “Strategist” system inspired by human strategic thinking that uses large language models to generate multiple high-level plans before an AI agent begins learning. Instead of random exploration, our system first identifies promising approaches, breaks these into specific steps, and guides the learning process using these strategic insights as rewards. For example, in a resource-gathering task, it might consider different methods like direct collection versus building infrastructure for long-term production.
Our approach substantially improves AI learning efficiency, requiring fewer attempts to master complex tasks while discovering behaviors that traditional methods often miss entirely. This could accelerate AI deployment in critical domains like medical treatment planning or business strategy, where extensive trial-and-error learning is neither feasible nor safe, making AI more practical for real-world applications.
Link To Code: https://github.com/antoninbrthn/strategist
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Top-down approach, Strategic Planning, Language-Conditioned RL
Submission Number: 11587
Loading