Abstract: This research explores strategies for \textit{steering} the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding \textit{style vectors} to the activations of hidden layers during text generation. We show that style vectors can be simply computed from recorded layer activations for input texts in a specific style in contrast to more complex training-based approaches. Through a series of experiments, we demonstrate the effectiveness of \textit{activation engineering} using such \textit{style vectors} to influence the style of generated text in a nuanced and parameterisable way, which distinguishes it from prompt engineering. This presented research constitues a significant step towards the development of more adaptive and affective AI-empowered interactive systems.
Paper Type: long
Research Area: Dialogue and Interactive Systems
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Theory
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading