From Implicit to Explicit: Enhancing Self-Recognition in Large Language Models

ACL ARR 2026 January Submission1050 Authors

27 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, representation editing, self-recognition of LLMs
Abstract: Large language models (LLMs) have been shown to possess a degree of self-recognition ability, which used to identify whether a given text was generated by themselves. Prior work has demonstrated that this capability is reliably expressed under the pair presentation paradigm (PPP), where the model is presented with two texts and asked to choose which one it authored. However, performance deteriorates sharply under the individual presentation paradigm (IPP), where the model is given a single text to judge authorship. Although this phenomenon has been observed, its underlying causes have not been systematically analyzed. In this paper, we first investigate the cause of this failure and attribute it to implicit self-recognition (ISR). ISR describes the gap between internal representations and output behavior in LLMs: under the IPP scenario, the model encodes self-recognition information in its feature space, yet its ability to recognize self-generated texts remains poor. To mitigate the ISR of LLMs, we propose cognitive surgery (CoSur), a novel framework comprising four main modules: representation extraction, subspace construction, authorship discrimination, and cognitive editing. Experimental results demonstrate that our proposed method improves the self-recognition performance of three different LLMs in the IPP scenario, achieving average accuracies of 99.00\%, 97.69\%, and 97.13\%, respectively.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study
Languages Studied: English
Submission Number: 1050
Loading