Abstract: Current practice in prompting, evaluation, and alignment of large language models (LLMs) often treats behavioural similarity as evidence of similar underlying control. This assumption is rarely examined at the level where control is instantiated at the human--model interface. Across diverse decoder-only models, fixing the prefix yields {\color{blue}separable} final-layer representations immediately prior to decoding, even when the continuation varies. Using a simple centroid-based criterion on final-layer interface vectors, prefix identity can be recovered with high accuracy in our evaluated setup, with results varying across models.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In this revision, we made the following changes:
1. Clarified terminology and scope. We replaced the earlier “manifold” wording with a more neutral notion of prefix-induced regions in representation space. We also added a definition of the interface-level control state (the final-layer vector prior to decoding) and frame the main result as identifiability under a nearest-centroid rule.
3. Expanded the methodological description. The Methods section now specifies the evaluation protocol more explicitly, including input construction, the representation used (`hidden_states[-1]`), the fixed extraction position (last token), centroid construction over a held-out split, and the cosine-similarity scoring procedure.
4. Reran the experiments under the updated protocol. Results are now reported separately for short and long input regimes, defined by the character length of the content segment.
5. Clarified the prompt-injection probe. The injection experiment is now described as a separate probe that inserts a short placeholder string into the input text while keeping the prefix unchanged, evaluated using the same representation and scoring pipeline.
6. Minor revisions for clarity. The introduction and discussion were streamlined, figure captions were clarified, and a concrete input example and appendix material were added to improve reproducibility.
Assigned Action Editor: ~Yu_Meng1
Submission Number: 6807
Loading