Neuronal Activation States as Sample Embeddings for Data Selection in Task-Specific Instruction Tuning

Da Ma, Gonghu Shang, Zhi Chen, Libo Qin, Yijie Luo, Lei Pan, Shuai Fan, Lu Chen, Kai Yu

Published: 2025, Last Modified: 19 May 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Task-specific instruction tuning enhances the performance of large language models (LLMs) on specialized tasks, yet efficiently selecting relevant data for this purpose remains a challenge. Inspired by neural coactivation in the human brain, we propose a novel data selection method called NAS, which leverages neuronal activation states as embeddings for samples in the feature space. Extensive experiments show that NAS outperforms classical data selection methods in terms of both effectiveness and robustness across different models, datasets, and selection ratios.