PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

ICLR 2026 Conference Submission10217 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Personality Control, Representation Engineering, Model Steering, Inference-Time Adaptation, Compositionality

TL;DR: Training-free activation vector algebra composes and dynamically steers LLM personalities at inference-time, matching fine-tuning on PersonalityBench and achieving strong win rates on the Persona‑Evolve benchmark.

Abstract: Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 10217

Loading