Abstract: Models for estimating the similarity between two utterances are
fundamental in speech technology. While fairly good automatic
measures exist for semantic similarity, {\it pragmatic} similarity has
not been previously explored. Using a new collection of thousands of
human judgments of the pragmatic similarity between utterance pairs,
we train and evaluate various predictive models. The best performing
model, which uses 103 features selected from HuBert's 24th layer,
correlates on average 0.74 with human judges for the highest-quality
data subset, and it sometimes approaches human inter-annotator agreement. We
also find evidence for some degree of generality across languages.
Loading