Navigating Worlds and Minds: Dynamic Evaluation of LLM Agent Robustness under Progressively disclosing Dual-Constraints

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: LLM Agent, Robustness
TL;DR: This paper introduces the first benchmark for evaluating LLM agents under progressively disclosed dual constraints (objective limits vs. subjective preferences) to assess their ability to learn from failure and adapt in multi-turn interactions.
Abstract: Real-world AI agents rarely operate with full upfront knowledge, yet existing benchmarks assume complete constraint visibility. We introduce the first dynamic evaluation framework for LLM agents under progressively disclosed dual constraints, distinguishing between objective world limitations (e.g., missing tools) and subjective user preferences. Unlike prior work, our benchmark assesses critical deployment capabilities: learning from failure, building user mental models, and maintaining robustness across multi-turn interactions. By tracking both "undisclosed" (unavoidable) and "disclosed" (repeat) violations in realistic scenarios, we provide a systematic measure of an agent’s ability to adapt to uncertainty and evolving human needs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 175
Loading