Paper Link: https://openreview.net/forum?id=oyxcYevum5Z
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: The standard approach to evaluating dialogue engagingness is by measuring Conversation Turns Per Session (CTPS), which implies that the dialogue length is the main predictor of the user engagement with a dialogue system. The main limitation of CTPS is that it can only be measured at the session level, i.e., once the dialogue is over. But a dialogue system has to continuously monitor user engagement throughout the dialogue session as well. Existing approaches to measuring turn-level engagingness require human annotations for training. We pioneer an alternative approach, Weakly Supervised Engagingness Evaluator (WeSEE), which uses the remaining depth (RD) for each turn as a heuristic weak label for engagingness. WeSEE does not require human annotations and also relates closely to CTPS, thus serving as a good learning proxy for this metric. We show that WeSEE achieves the new state-of-the-art results on the Fine-grained Evaluation of Dialog (FED) dataset (0.38 Spearman) and the DailyDialog dataset (0.62 Spearman).