Automatic Evaluate Dialogue Appropriateness by Using Dialogue Act

Bao Chen; Yuanjie Wang; Zeming Liu; Yuhang Guo

Automatic Evaluate Dialogue Appropriateness by Using Dialogue Act

Bao Chen, Yuanjie Wang, Zeming Liu, Yuhang Guo

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Dialogue and Interactive Systems

Keywords: Automatic Dialogue System Evaluation; Dialogue Evaluation; Dialogue System

TL;DR: This paper evaluates the appropriateness of chatbot responses by comparing the similarity of dialogue act patterns between human-machine dialogues and human-human dialogues.

Abstract:

Evaluation of dialogue systems requires assessing various aspects, among which appropriateness holds significance as a core element of communicative language competence. However, current evaluations heavily rely on human judgments, which are time-consuming, labor-intensive, prone to biases, and lacking objectivity. In this paper, we introduce Dialogue Act Appropriateness (DAA), a novel method that utilizes the underlying patterns of dialogue act transitions to evaluate the appropriateness of chatbot responses. We learn transition patterns from human-human dialogue corpora, evaluating chatbot appropriateness by measuring the similarity of their transition patterns to those observed in human-human dialogues. To validate DAA, we annotate a test dataset by manually evaluating the appropriateness of dialogues from multiple chatbot systems. The experimental results demonstrate a strong correlation between our evaluation metric and human ratings, establishing the reliability of DAA as a measure of dialogue appropriateness.

Submission Number: 5387

Loading