Granular Change Accuracy: A more accurate performance metric for Dialogue State TrackingDownload PDF


03 Sept 2022 (modified: 05 May 2023)ACL ARR 2022 September Blind SubmissionReaders: Everyone
Abstract: Current community-accepted metrics used to evaluate Dialogue State Tracking (DST) have key weaknesses: they do not assign partial scores and over-penalize for mistakes that occur in earlier turns. Their assumptions about error uniformity leads to inaccurate DST evaluation. We propose a new metric to address this challenge --- Granular Change Accuracy (GCA) --- that evaluates for predicted changes in dialogue state over the entire dialogue history. Our benchmarking shows that GCA mitigates irrelevant traits in predictions; i.e. distribution uniformity and position of mistakes over turns, leading to more accurate evaluation.
Paper Type: short
0 Replies
