RepE-Monitor: Representation-Based Criteria for Diagnosing Prompt Effectiveness

RepE-Monitor: Representation-Based Criteria for Diagnosing Prompt Effectiveness

ACL ARR 2026 January Submission2830 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Engineering, Model Interpretability, Prompt and Context Engineering

Abstract: We propose a principled, training-free criterion for evaluating prompt effectiveness: for concepts satisfying the Linear Representation Hypothesis, prompt success can be diagnosed before any output is generated by examining whether the intended concept is geometrically well-formed in the model's internal state. We operationalize this criterion through five geometric properties---Contrast and Additivity as core requirements implied by LRH, plus Intensity, Order Invariance, and Saturation as diagnostic indicators---and validate across 220 conditions (5 models $\times$ 3 frameworks), with 97.3\% ID and 92.3\% OOD accuracy confirming the extracted directions are meaningful. This criterion yields two immediate consequences. First, context engineering failures become diagnosable: Distraction, Confusion, Clash, and Poisoning each produce characteristic geometric signatures---signal decay, proportion reduction, polarity weakening, or complete reversal---enabling failure-type identification without behavioral testing. Second, failures become repairable: because failures are geometric perturbations, steering can restore concept activation by correcting the internal structure, recovering both representation signals and output behavior. Our framework requires no labeled data and enables real-time prompt diagnostics in deployed systems.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Representation Engineering, Model Interpretability, Prompt and Context Engineering

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: English

Submission Number: 2830

Loading