Detecting Foreign Content in Self-Generated Text: A Recognition Study of Large Language Models

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM evaluation, self recoginition, experiment
TL;DR: Can large language models (LLMs) detect edits to their own generated text?
Abstract: Can large language models (LLMs) detect edits to their \emph{own} generated text? Inspired by the biological mirror test, we study a foreign-content recognition task in which a story produced by model $M_1$ is locally modified by a (possibly different) model $M_2$, and $M_1$ is then used as an evaluator to identify \emph{which portion} of the content was modified. Using six frontier models and 36K controlled narratives, we find that recognition accuracy is consistently above the random baseline but varies substantially across model pairs. Results reveal heterogeneous stylistic signatures, with some modified content far easier to identify than others, and asymmetric detection relationships between models. Performance also depends on context: recognition declines with longer stories and fluctuates by sentence position, with early and late insertions proving most difficult. Together, these findings establish recognition as a measurable dimension of model behavior, offering new insights into distinctiveness and the reliability of introspection in LLMs.
Submission Number: 165
Loading