Extraversion or Introversion? Controlling The Personality of Your Large Language Models

Extraversion or Introversion? Controlling The Personality of Your Large Language Models

ACL ARR 2025 February Submission5998 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

Large language models (LLMs) demonstrate advanced text generation and comprehension capabilities, mimicking human behavior and displaying synthetic personalities. However, some LLMs have displayed undesirable personalities, propagating toxic discourse. Existing literature overlooks control methods for shaping reliable and stable LLM personalities. To fill these gaps, we constructed valuable personality datasets and investigated several control methods to influence LLM personalities, including three training-based methods: Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with the inference phase (prompts). Our findings indicate that training-based methods offer superior robustness in maintaining personalities, while prompt-based techniques are more effective in controlling personalities. Based on these insights, we propose Prompt Induction post Supervised Fine-tuning (PISF), a novel method that ensures high success rates, efficacy, and robustness in personality control. Extensive experimental results show that PISF achieved safe and reliable LLM personality control, demonstrating its effectiveness.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: computational psycholinguistics

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 5998

Loading