Towards a General-Purpose Model of Perceived Pragmatic Similarity

Nigel G. Ward, Andres Segura, Alejandro Ceballos, Divette Marco

Published: 09 Sept 2024, Last Modified: 07 Jun 2024InterspeechEveryoneCC0 1.0

Abstract: Models for estimating the similarity between two utterances are fundamental in speech technology. While fairly good automatic measures exist for semantic similarity, {\it pragmatic} similarity has not been previously explored. Using a new collection of thousands of human judgments of the pragmatic similarity between utterance pairs, we train and evaluate various predictive models. The best performing model, which uses 103 features selected from HuBert's 24th layer, correlates on average 0.74 with human judges for the highest-quality data subset, and it sometimes approaches human inter-annotator agreement. We also find evidence for some degree of generality across languages.