LUMINA: Long-horizon Understanding for Multi-turn Interactive Agents

ICLR 2026 Conference Submission18909 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, multi-turn, agents, long-horizon
TL;DR: oracle intervention framework and analysis to determine skills needed for a multi-turn LLM-based agent
Abstract: Language models have shown to excel at a variety of tasks (e.g., mathematical reasoning and coding) which are fundamental to solving more general goal-oriented feedback-driven agentic problems. However, based on recent findings, two key points are evident: (a) agentic problems require a variety of skills such as long-context reasoning, planning and decision making, and efficient exploration; (b) even large frontier models under-perform in these family of tasks, especially in problems requiring long-horizon understanding. For example, Qwen3-235B has a 44.5\% accuracy on BFCLv3 multi-turn. In this paper, our goal is to understand the relation between the two, by examining which skills are necessary for solving multi-turn problems. We work towards this goal using an oracle counter-factual framework that allows us to answer the question: what if the agent could leverage a specific oracle skill to achieve its goal? To enable this framework, we introduce a set of procedurally-generated game-like tasks whose complexity can be controlled. For these controlled environments, we can provide accurate oracle interventions to guide the agent towards the goal. Our findings suggest that while most interventions (e.g., planning) are generally beneficial, for some interventions the utility depends on the intricacies of the benchmark (e.g., ability to track state while iteratively modifying python lists).
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18909
Loading