Evaluating LLM-generated Explanatory Dialogue Turns through Dialogue Completion

Evaluating LLM-generated Explanatory Dialogue Turns through Dialogue Completion

ACL ARR 2025 February Submission4 Authors

01 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Human dialogues frequently feature explanations when it comes to conveying ideas and engaging in discourses. Synthetic explanatory dialogues offer potential for various applications such as dialogue systems and model self-rationalization. However, synthetic dialogues are typically regarded as inferior in quality compared to human ones. We investigate large language models' capability of completing a missing dialogue turn within a given context of an explanatory conversation. We conduct experiments over three datasets, which cover both natural and synthetic explanatory dialogues, and apply two test suites for evaluation. While the evaluation confirms the quality gap between human and synthetic dialogues, LLM-generated turns are found to outperform human ones in fluency and grammatical accuracy. Moreover, while each of the three investigated models demonstrates distinct strengths and weaknesses on the task, their performance can be consistently improved through prompt-based refinement methods.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: dialogue, automatic evaluation, evaluation and metrics, free-text/natural language explanations, conversational modeling, commonsense reasoning, prompting

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 4

Loading