Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

ACL ARR 2026 January Submission7978 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: personality traits, personality steering, geometric evaluation of personality traits, Large Language Models

Abstract: Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. In this work, we examine whether this assumption holds by analysing the geometric relationships between Big Five personality steering directions. We study steering vectors extracted from two model families (LLaMA-3-8B and Mistral-8B) and apply a range of geometric conditioning schemes, from unconstrained directions to soft and hard orthonormalisation. Our results show that personality steering directions exhibit substantial geometric dependence: steering one trait consistently induces changes in others, even when linear overlap is explicitly removed. While hard orthonormalisation enforces geometric independence, it does not eliminate cross-trait behavioural effects and can reduce steering strength. These findings suggest that personality traits in LLMs occupy a slightly coupled subspace, limiting fully independent trait control.

Paper Type: Short

Research Area: Dialogue and Interactive Systems

Research Area Keywords: Dialogue and Interactive Systems, Generation, Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 7978

Loading