Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: More accurate localization and diverse solutions enhances LLM software agents
Abstract: Large Language Models (LLMs) have demonstrated significant potential in code generation by following natural language instructions. Unfortunately, crucial real-world software engineering tasks, such as debugging or repository-level feature implementation, involve processing extensive contexts beyond current LLM context sizes and performing complex reasoning that is brittle using standard autoregressive decoding. Enhancing LLMs' performance in these scenarios requires careful consideration of the contextual information provided to the model, optimizing how the model leverages that, and identifying tools that enable more effective navigation of the development environment. To address these challenges, we introduce Nemotron-CORTEXA, an agentic system built on a predefined scaffold that enhances LLMs' ability to navigate and reason efficiently in complex software engineering contexts. Specifically, we develop a novel code embedding model that retrieves the most relevant files with greater precision, along with a localization agent that refines the granularity of the retrieval process. Additionally, we demonstrate that providing diverse contextual information and utilizing different prompt formats enable the model to identify and resolve issues more efficiently. We evaluate Nemotron-CORTEXA using SWE-bench, a benchmark derived from real-world GitHub issues. Compared to the widely used Agentless framework, Nemotron-CORTEXA achieves a higher issue resolution rate at a lower cost, highlighting its practical impact in addressing real-world software engineering challenges.
Lay Summary: Advanced AI models, known as Large Language Models (LLMs), can now write code from plain English instructions. While they're great for simple tasks, they often hit a wall when faced with large, real-world software projects. These projects often contain too much code for a model to understand, and tasks like fixing bugs or adding features involve complex reasoning beyond the current models’ capability. We built Nemotron-CORTEXA, a system to help LLMs work more effectively in large software projects. It contains special search tools to find exact code files relevant to a task and then zooms in to find precise lines. It also presents information to the models in diverse ways, improving their understanding and increasing their chances of success. When tested on real software bugs from GitHub, Nemotron-CORTEXA fixed more issues than previous methods and did so more cost-effectively, highlighting its strong potential for tackling complex software challenges in the real world.
Primary Area: Deep Learning->Large Language Models
Keywords: Coding assistant, code embedding model, localization agent, SWE-bench, prompt design, contextual information, multi-LLM, multi-prompt
Submission Number: 9631
Loading