Subliminal Prosody Learning: Auxiliary Emotion Supervision Redistributes Affective Representations Across ALM Layers

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 VirtualposterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Methods (probing, steering, causal interventions), Applications of interpretability, Interpretability for Knowledge Discovery
TL;DR: We show subliminal learning in audio-language models, where learning emotion classification helps the model become more emotionally aware.
Abstract: We study how a simple emotion classification objective, applied to a few LoRA-adapted layers of an audio-language model (ALM), redistributes affective information across \emph{all} layers---including those whose parameters remain frozen---through residual-stream propagation. We call this phenomenon \emph{subliminal prosody learning} and, to our knowledge, provide the first systematic study of representational propagation across multiple ALM architectures: Qwen2.5-Omni-7B, Audio Flamingo~3, and MOSS-Audio-4B. Mean probe gain in unadapted layers is +32.0\,pp (Omni), +22.0\,pp (AF3), and +13.6\,pp (MOSS). Out-of-distribution (OOD) classification improves by up to +23.4\,pp, and learned emotion directions recover the Russell circumplex while transferring cross-modally. Critically, linear decodability does not imply functional use: we test whether this representational accessibility translates into generation behavior. Results are consistent with a threshold-like relationship---only Omni, with the largest probe gain, achieves significant prosody-sensitive generation changes ($\Delta = +0.35$, $p < 0.001$), with an emotion-selective pattern (neutral: +0.04, n.s.; happy: +1.03***; sad: +0.48***) that rules out generic verbosity. No empathy supervision was used: prosody-sensitive generation emerges solely as a consequence of a classification-only auxiliary objective.
Submission Number: 259
Loading