Comparative analysis of black box methods for detecting evaluation awareness in LLMs

Igor Ivanov

Comparative analysis of black box methods for detecting evaluation awareness in LLMs

Igor Ivanov

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, evaluations, evaluation awareness

TL;DR: Systematic comparative analysis and testing of different methods to measure evaluation awareness of LLMs, as well as a tool for doing so for evaluations developers

Abstract: LLMs are sometimes aware of being evaluated. As a result, they might behave differently in evaluations compared to real-world scenarios. To investigate this phenomenon, we first need to properly measure it. Recently, a number of papers that measure evaluation awareness have been published, but they all measure it in different ways that are hard to compare. This work provides a systematic comparison of these methods, as well as introduces several new ones. It compares them using the same diverse dataset of LLM-user interactions, and analyses the resulting data in-depth. Building on these findings, it introduces a taxonomy of prompt features that cause LLMs to classify prompts as evaluations, and a practical tool for eliciting such features for any evaluations. These findings might help to create more trustworthy and realistic evaluations that LLMs are unable to distinguish from real-world tasks.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 4186

Loading