AIs Fail to Recognize Themselves and Mostly Think They are GPT or Claude

Published: 08 Oct 2025, Last Modified: 18 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI self-awareness, self-recognition, large language models, model identification, AI bias, self-referential reasoning
TL;DR: AI models are blind to their own work and systematically biased toward thinking everything was written by GPT or Claude.
Abstract: Recent work shows that AI systems may exhibit systematic bias favoring AI-generated content in hiring and resource allocation, creating discriminatory outcomes. However, detecting and correcting such biases requires that AI systems can identify their own outputs—a fundamental prerequisite for AI safety that remains unexplored. We present the first systematic evaluation of self-recognition capabilities across 10 contemporary LLMs through model prediction and binary self-identification tasks. Our cross-evaluation design reveals striking systematic failures: most models never predict themselves as generators, demonstrating a fundamental absence of self-consideration in text identification tasks. Additionally, models exhibit extreme bias toward predicting the GPT and Claude families. Performance on both exact model prediction and binary self-identification remained near random baseline levels across all evaluation conditions. These findings expose that current AI systems fundamentally lack the self-awareness necessary to bias towards or against their own outputs. The implications extend beyond technical capabilities to fundamental questions about AI safety, transparency, and the feasibility of monitoring AI systems in consequential decision-making contexts.
Supplementary Material: zip
Submission Number: 135
Loading