LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

12 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Long-Context Large Language Models, Software Engineering, Benchmark
TL;DR: LoCoBench provides 8,000 evaluation scenarios across 10 programming languages to systematically assess long-context LLM performance in complex software engineering tasks.
Abstract: The rise of long-context language models with million-token windows opens new possibilities for advanced code understanding and software development evaluation. We propose LoCoBench, a benchmark designed to assess long-context LLMs on realistic, complex development tasks. Unlike existing benchmarks centered on single-function or short-context tasks, LoCoBench targets capabilities like whole-codebase understanding, cross-file reasoning, and architectural consistency in large systems. It offers 8,000 scenarios across 10 languages, with context lengths from 10K to 1M tokens, enabling precise measurement of long-context performance degradation. LoCoBench spans 8 task categories, architectural understanding, cross-file refactoring, multi-session development, bug investigation, feature implementation, code comprehension, integration testing, and security analysis. Built through a 5-phase pipeline, it produces diverse, high-quality scenarios requiring reasoning over large codebases. We introduce a comprehensive evaluation framework with 17 metrics across 4 dimensions including 6 new evaluation metrics: Architectural Coherence Score, Dependency Traversal Accuracy, Cross-File Reasoning Depth, Incremental Development Capability, Information Coverage Utilization, Multi-Session Memory Retention, combined into a unified LoCoBench Score (LCBS). Evaluations of state-of-the-art models reveal substantial performance gaps, underscoring long-context software development as a critical challenge.
Primary Area: datasets and benchmarks
Supplementary Material: zip
Submission Number: 4581
Loading