Observer-Side Diagnosis of Prompt-Induced Interference in Large Language Models: A Macro-Group Vocabulary and Targeted Cross-Lingual Stress Tests

Observer-Side Diagnosis of Prompt-Induced Interference in Large Language Models: A Macro-Group Vocabulary and Targeted Cross-Lingual Stress Tests

04 Apr 2026 (modified: 27 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Small prompt fragments can change not only an intended surface property of a response (e.g., style), but also secondary reliability-relevant behaviors such as epistemic commitment, scope of alternatives, and reasoning presentation. Such interaction-level behavioral shifts are difficult to characterize with conventional task-level prompt evaluation. This study proposes an observer-side diagnostic vocabulary for describing prompt-induced interference in large language models (LLMs). The vocabulary organizes prompt effects into four macro-groups—framing (role/task/audience/objective), reasoning (process/scope), expression (style/format/length), and epistemic control (stance/constraints)—and is instantiated here as the Z-model, an auditable 11-axis reference basis for reporting and comparison under black-box access. This reference basis is a pragmatic descriptive choice for interpretability and reporting, rather than an ontological, minimality, or model-internal claim. Empirically, we do not attempt to validate all four macro-groups at once. Instead, we run a targeted Japanese/English stress test of one high-leverage pathway: an expression-oriented politeness cue and its secondary effects on epistemic- and scope-related proxies. Under a matched interaction protocol (five benign topics; 250 samples per language-condition), the same politeness cue reliably changes expression while redistributing uncertainty and alternative/conditional markers in language-dependent ways. These are interpreted as protocol-level effects, and potential confounds from model training and alignment differences across languages are explicitly discussed. As a prediction-to-observation check, we additionally run a small factorial 2×2 probe and observe localized non-additivity consistent with structured latent interference. Key directional patterns are also reproduced on a pinned open-weight model checkpoint. Overall, the contribution is a scoped diagnostic framework plus evidence that one targeted cross-group pathway can be made auditable with lightweight black-box probes; the inverse direction is presented as a post-hoc diagnostic workflow rather than a validated latent estimator.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Sachin_Kumar1

Submission Number: 8254

Loading