For a Fistful of Puns: Evaluating a Puns in Multiword Expressions Identification Algorithm Without Dedicated Dataset

ACL ARR 2025 February Submission400 Authors

07 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Machine Translation systems has always faced challenges such as multiword expressions (MWEs) and wordplays, which impact their performances, being idiosyncratic and pervasive across different languages. In this context, we seek to explore the nature of puns created from multiword expressions (PMWEs), characterized by the creation of a wordplay from a source MWE to recontextualize it or to give it a humorous touch. Little work has been done on these entities in NLP. To address this challenge, we introduce ASMR, an alignment-based PMWE identification and tagging algorithm. We offer an in-depth analysis of three different approaches to ASMR, each created to identify different types of PMWEs. In the absence of PMWE-related datasets and resources, we proceed to a snowclone detection task in English. We also perform a MWE identification task in 26 languages to evaluate performances across different languages. We show that ASMR exhibits state-of-the-art results for the snowclone detection task and produces interesting results with the MWE identification task. These results may indicate that ASMR is suitable for a PMWE identification task.

Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilingualism,linguistic variation,multilingual evaluation,less-resourced languages,software and tools
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: Arabic,Bulgarian,Czech,German,Greek,English,Spanish,Basque,Persian,French,Irish,Hebrew,Hindi,Croatian,Hungarian,Italian,Lithuanian,Maltese,Polish,Portuguese,Romanian,Slovenian,Serbian,Swedish,Turkish,Chinese
Submission Number: 400
Loading