From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion

From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion

ACL ARR 2026 January Submission224 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Long Code Completion, Code Structure, Sparse Attention

Abstract: Code Large Language Models face critical Time-To-First-Token (TTFT) latency challenges when handling long code completion due to the quadratic complexity ($O(n^2)$) of attention mechanisms. While existing sparse attention methods attempt to address this issue, they suffer from three key limitations: (1) general sparse patterns cause excessive accuracy degradation without considering code structure, (2) code-related methods achieve only logical sparsity without actual computational speedup, and (3) limited adaptation to complex scenarios such as repository-level completion. We propose **SabreCoder**, a training-free **S**tructure-**a**ware **B**lock-spa**r**s**e** attention mechanism that bridges the gap between logical and computational sparsity. SabreCoder parses code into semantic chunks, constructs chunk-level sparse patterns through dependency analysis and similarity matching, and maps them to GPU-friendly block-sparse formats. Extensive experiments on LCC and CrossCodeEval benchmarks demonstrate that SabreCoder reduces TTFT by 45-55% while maintaining accuracy within 3% of dense attention.

Paper Type: Long

Research Area: Code Models

Research Area Keywords: code models,LLM Efficiency,sparse models,retrieval-augmented generation,code generation and understanding

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: Python,Java,C#

Submission Number: 224

Loading