PED: Route-Decoupled Diagnostics for Persona Consistency in Spoken Agents

ACL ARR 2026 January Submission4677 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: spoken role-playing, persona consistency, diagnostic evaluation, persona drift
Abstract: Maintaining a stable persona is central to sustained spoken role-playing, yet when an agent breaks character, current evaluations often do not isolate which component caused the failure, making fixes slow and ad hoc. We propose \textbf{PED} (Persona--Emotion Decoupling), a diagnostic evaluation framework that treats spoken agents as multi-stage systems and decomposes persona expression into two observable routes: what the agent says (text) and how it sounds (speech). PED projects transcripts and audio into a shared affective measurement space, enabling route-comparable trajectories and baseline-referenced analyses organized by four research questions (separability, drift, failures, coupling). We demonstrate PED via two worked instantiations spanning an end-to-end Speech LLM and a cascaded LLM+TTS pipeline under a fixed multi-phase dialogue protocol. In this instantiated setting, PED surfaces four recurring diagnostic signatures: (i) route-level separability is bounded by reference overlap and can differ sharply across architectures, (ii) text-route drift is stress-linked and tends toward a neutral mode, (iii) text--audio consistency is weakly coupled, yielding route-asymmetric failures, and (iv) audio-route structure can be materially shaped by an explicit intermediate style cue in cascaded pipelines. Overall, PED reframes holistic ``voice+character'' grading as turn-level, fault-localizing signals that support faster debugging and iteration.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: spoken dialogue systems, evaluation and metrics
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 4677
Loading