Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

05 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Specialized Generalist Models, Large Language Models
TL;DR: We present Nirvana, a Specialized Generalist Model with task-aware memory mechanism, linear time complexity, and test-time task information extraction.
Abstract: Large Language Models (LLMs) have achieved remarkable success across a wide range of general language tasks but remain constrained in specialized domains. To address this problem, specialized memory mechanism can be used to enhance the model's ability on specialized tasks. Specialized Generalist Models (SGMs) aim to preserve broad capabilities while achieving expert-level performance in target domains via test-time task identification and reconfiguration. However, traditional LLM structures including Transformer, Linear Attention, and hybrid models do not employ specialized memory mechanism guided by task information. In this paper, we present Nirvana, an SGM with specialized memory mechanism, linear time complexity, and test-time task information extraction. Besides, we propose the Task-Aware Memory Trigger ($\textit{Trigger}$) that flexibly adjusts memory mechanism based on the current task's requirements. In Trigger, each incoming sample is treated as a self-supervised fine-tuning task, enabling Nirvana to adapt its task-related parameters on the fly to domain shifts. We also design the Specialized Memory Updater ($\textit{Updater}$) that dynamically memorizes the context guided by Trigger. We conduct experiments on both general language tasks and multiple specialized domains. Nirvana matches or exceeds the performance of LLM baselines on general benchmarks, while achieving the lowest perplexity across specialized domains including biomedicine, finance, and law. On the challenging task of Magnetic Resonance Imaging (MRI), we attach lightweight codecs to the frozen Nirvana backbone and fine-tune them on paired k-space measurements and images. Trigger enables effective adaptation to the MRI domain by adjusting task-related parameters during inference, even without updating the backbone. Nirvana yields higher-fidelity MRI reconstructions than conventional MRI models and LLM-based models, and it also generates reliable preliminary clinical reports. Ablation studies show that removing Trigger results in notable performance degradation across all evaluation tasks, demonstrating its essential role in task-aware specialization.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2394
Loading