RepoMirage: Do Code Agents Really Understand Repository Structures?

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0
Keywords: code, agent, robustness
Abstract: Despite the impressive progress of code agents on code generation benchmarks, their robustness in understanding repository structure is overlooked. This is critical as real repositories often come with noisier and less informative structure cues, such as messy layouts and misleading naming conventions. We present RepoMirage, an automated perturbation framework that probes this robustness gap by producing function-preserving variants of SWE-bench Verified tasks and evaluating agents under repository-level test-time shifts. Multiple levels of perturbations, from prompts to structures, are imposed on the original tasks while maintaining functionality. On 158 curated instances, these shifts consistently degrade performance and make target localization harder, with the resolution rate of GPT-4o dropping from 32.91% to 7.59%, exposing a robustness gap in current code agents.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 123
Loading