From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models}

Andrew Jeremy Kiruluta

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models}

Andrew Jeremy Kiruluta

01 May 2025 (modified: 21 Apr 2026)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Spectral Dictionary, Dictionary Learning, Short‐Time Fourier Transform, Gaussian Prior, Pointer‐Generator

TL;DR: SDGM introduces an efficient, interpretable Fourier‐based generative framework for language modeling that matches transformer perplexities with linear time complexity.

Abstract: We propose a novel spectral generative modeling framework for natural language processing that jointly learns a global time‐varying Fourier dictionary and per‐token mixing coefficients, replacing the ubiquitous self‐attention mechanism in transformer architectures. By enforcing reconstruction losses in both the time domain (embedding reconstruction) and the frequency domain (via Short-Time Fourier Transform magnitude matching) alongside a standard language modeling objective, and fitting a Gaussian Mixture Model (GMM) prior over the learned mixing vectors, our approach achieves competitive perplexity and generation quality on standard benchmarks such as WikiText‐2 and Penn Treebank. In contrast to $\mathcal{O}(L^2)$ self‐attention, our method operates with $\mathcal{O}(KL)$ complexity, where $K \ll L$ is the dictionary size, delivering substantial efficiency gains. We demonstrate that spectral dictionary models can achieve competitive performance compared to transformer baselines while significantly reducing inference latency and memory footprint, offering a compelling alternative for scalable language modeling.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 5704

Loading