Word Predictability is Based on Context – and/or FrequencyDownload PDF

Anonymous

09 Mar 2022 (modified: 05 May 2023)Submitted to CMCL 2022Readers: Everyone
Keywords: Word Predictability, Raw Word Embeddings, BERT, Non-canonical structures, Poetry Metaphors, Frequency Evaluation
TL;DR: We present an experiment to test Transformer Models sensitivity to non-canonical syntactic structures and presence of infrequent words. We created a “predictability parameter” which produces better distinctions between classes.
Abstract: In this paper we present an experiment carried out with BERT on a small number of Italian sentences taken from two domains: newspapers and poetry domain. They represent two levels of increasing difficulty in the possibility to predict the masked word that we intended to test. The experiment is organized on the hypothesis of increasing difficulty in predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and semantic level. Whereas lexical predictability may be based on word frequency and not just context, syntax and semantics strictly constrain meaning understanding. To test this hypothesis we alternate canonical and non-canonical version of the same sentence before processing them with the same DL model. In particular, we expect the poetry domain to introduce additional restrictions on the local word context due to the need to create metaphors which require non-literal meaning compositional processes. The result shows that DL models are highly sensitive to presence of non-canonical structures and to local non-literal meaning compositional effect. However, DL are also very sensitive to word frequency by predicting preferentially function vs content words, collocates vs infrequent word phrases. To measure differences in performance we created a linguisticalluy based “predictability parameter” which is highly correlated with a cosine based classification but produces better distinctions between classes.
4 Replies

Loading