Search-based Evaluation from Truth Transcripts for Voice Search Applications

François Mairesse, Paul Raccuglia, Shiv Vitaladevuni

2016 (modified: 12 Nov 2022)SIGIR 2016Readers: Everyone

Abstract: Voice search applications are typically evaluated by comparing the predicted query to a reference human transcript, regardless of the search results returned by the query. While we find that an exact transcript match is highly indicative of user satisfaction, a transcript which does not match the reference still produces satisfactory search results a significant fraction of the time. This paper therefore proposes an evaluation method that compares the search results of the speech recognition hypotheses with the search results produced by a human transcript. Compared with a strict sentence match, a human evaluation shows that search result overlap is a better predictor of (a) user satisfaction and (b) search result click-through. Finally, we propose a model predicting the Expected Search Satisfaction Rate (ESSR), conditioned on search overlap outcomes. On a held out set of 1036 voice search queries, our model predicted an ESSR within 0.9% (relative) of the ground truth satisfaction averaged over 3 human judges.

0 Replies