Methodological Insights in Detecting Subtle Semantic Shifts with Contextualized and Static Language Models

Sanne Hoeken; Özge Alacam; Antske Fokkens; Pia Sommerauer

Methodological Insights in Detecting Subtle Semantic Shifts with Contextualized and Static Language Models

Sanne Hoeken, Özge Alacam, Antske Fokkens, Pia Sommerauer

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Semantics: Lexical

Keywords: semantic shift detection, contextualized embeddings, static embeddings, political communities

TL;DR: This paper investigates the detection of subtle semantic shifts between political communities using static and contextualized language models.

Abstract: In this paper, we investigate automatic detection of subtle semantic shifts between social communities of different political convictions in Dutch and English. We perform a methodological study comparing methods using static and contextualized language models. We investigate the impact of specializing contextualized models through fine-tuning on target corpora, word sense disambiguation and sentiment. We furthermore propose a new approach using masked token prediction, that relies on behavioral information, specifically the most probable substitutions, instead of geometrical comparison of representations. Our results show that methods using static models and our masked token prediction method can detect differences in connotation of politically loaded terms, whereas methods that rely on measuring the distance between contextualized representations are not providing clear signals, even in synthetic scenarios of extreme shifts.

Submission Number: 4188

Loading