Dialogue Act-Aided Backchannel Prediction Using Multi-Task Learning

Wencke Liermann; Yo-Han Park; Yong-Seok Choi; Kong Joo Lee

Dialogue Act-Aided Backchannel Prediction Using Multi-Task Learning

Wencke Liermann, Yo-Han Park, Yong-Seok Choi, Kong Joo Lee

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Dialogue and Interactive Systems

Submission Track 2: Speech and Multimodality

Keywords: backchannel prediction, multi-task learning, dialogue act, pre-trained audio encoder, voice activity projection

Abstract: Produced in the form of small injections such as "Yeah!" or "Uh-Huh" by listeners in a conversation, supportive verbal feedback (i.e., backchanneling) is essential for natural dialogue. Highlighting its tight relation to speaker intent and utterance type, we propose a multi-task learning approach that learns textual representations for the task of backchannel prediction in tandem with dialogue act classification. We demonstrate the effectiveness of our approach by improving the prediction of specific backchannels like "Yeah" or "Really?" by up to 2.0\% in F1. Additionally, whereas previous models relied on well-established methods to extract audio features, we further pre-train the audio encoder in a self-supervised fashion using voice activity projection. This leads to additional gains of 1.4\% in weighted F1.

Submission Number: 3403

Loading