Extraversion or Introversion? Controlling The Personality of Your Large Language Models

ACL ARR 2025 February Submission5998 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Large language models (LLMs) demonstrate advanced text generation and comprehension capabilities, mimicking human behavior and displaying synthetic personalities. However, some LLMs have displayed undesirable personalities, propagating toxic discourse. Existing literature overlooks control methods for shaping reliable and stable LLM personalities. To fill these gaps, we constructed valuable personality datasets and investigated several control methods to influence LLM personalities, including three training-based methods: Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with the inference phase (prompts). Our findings indicate that training-based methods offer superior robustness in maintaining personalities, while prompt-based techniques are more effective in controlling personalities. Based on these insights, we propose Prompt Induction post Supervised Fine-tuning (PISF), a novel method that ensures high success rates, efficacy, and robustness in personality control. Extensive experimental results show that PISF achieved safe and reliable LLM personality control, demonstrating its effectiveness.

Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: computational psycholinguistics
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 5998
Loading