AIGCoder 1.0: Locally-Enhanced Language Modeling with Explicit and Structured Knowledge Memory

10 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Knowledge-Refined Attention, Decoupled Mixture-of-Experts
TL;DR: We present AIGCoder, an LLM with Local Fusion Attention for local patterns and a Knowledge Memory Module for flexible global knowledge retrieval, achieving 1.33× faster pre-training than baselines.
Abstract: Large language models (LLMs) have achieved remarkable breakthroughs across various applications. However, their architectures remain inefficient due to two main limitations: (i) self-attention lacks an explicit inductive bias for locality, leading to redundant modeling of sequence-internal local information; (ii) mixture-of-experts (MoE) implicitly couples knowledge storage with computational pathways, hindering flexible access to sequence-external global knowledge. To overcome these limitations, we propose AIGCoder (AI Generative Coder), a novel LLM architecture that augments the standard decoder with two dedicated modules: 1) Local Fusion Attention (LFA), which incorporates a convolutional fusion to attention, explicitly capturing local patterns and allowing the attention to operate on more informative representations; 2) Knowledge Memory Module (KMM), which introduces a parametric key–value memory that explicitly stores global knowledge in addressable slots, decoupling storage from computation and enabling direct knowledge retrieval. Together, these modules enable AIGCoder to achieve more efficient and effective integration of information at both levels. Experimental results show that AIGCoder converges 1.33× faster in pre-training than baseline models, underscoring its superiority over existing LLM architectures.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 3646
Loading