Abstract: Recent studies have shown that language models achieve high performance in idiomaticity detection tasks. Given the crucial role of context in interpreting these expressions, it is important to evaluate how models use context to make this distinction. To this end, we collect a comprehensive evaluation dataset to see how the model discriminates the use of the same expression in two different contexts. In particular, we produce high-quality instances of idiomatic expressions occurring in their non-dominant literal interpretation, as a way to test whether models can use the context to construct meaning. Our findings highlight the models' tendency to default to figurative interpretations and they do not appear to fully attend to the context. Moreover, the frequency of idioms impacts their ability to accurately discern literal and figurative meanings.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: evaluation,NLP datasets,multi-word expressions,
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 3687
Loading