Rethinking Word Similarity: Semantic Similarity through Classification Confusion

ACL ARR 2024 June Submission1094 Authors

14 Jun 2024 (modified: 01 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Word similarity is important for NLP and its applications to humanistic and social science tasks, like measuring meaning changes over time, detecting biases, understanding contested terms, and more. Yet the traditional similarity method based on cosine between word embeddings falls short in capturing the context-dependent, asymmetrical, polysemous nature of semantic similarity. We propose a cognitively-inspired model drawing on the proposal of Tversky (1977) that for conceptual tasks, people focus on extracting and compiling only the relevant features. Our Word Confusion model reframes semantic similarity in terms of feature-based classification confusion. We train a classifier to map from contextual embeddings to words and use the classifier confusion (the probability of choosing confound word c instead of correct target t) as a measure of the similarity of c and t. We show that Word Confusion outperforms cosine similarity in matching human similarity judgments across several datasets (MEN, WirdSim353, and SimLex), can measure similarity using predetermined features of interest, and enables qualitative analysis on real-world data. Reframing similarity based on classification confusion offers a cognitively-inspired, directional, and interpretable way of modeling the relationship between concepts.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Semantic similarity
Contribution Types: NLP engineering experiment, Position papers
Languages Studied: English, French, Italian
Submission Number: 1094
Loading