Abstract: Euphemism identification aims to identify the true meaning of a given euphemism, such as identifying “weed” (euphemism) as “marijuana” (target keyword) in illicit transactions, which is of great significance to help content moderation and combat underground market. However, existing methods only use text data to identify euphemisms, ignoring the semantic information of other modalities associated with the corresponding target keywords during the development and evolution of euphemisms. Additionally, the lack of multimodal datasets of euphemisms also hinders related research. In this paper, we regard euphemisms and their corresponding target keywords as keywords and propose improving euphemism identification quality through keyword-oriented visual and audio features. To this end, we first introduce a keyword-oriented multimodal corpus of euphemisms (KOM-Euph), involving three datasets (Drug, Weapon, and Sexuality), including text, images, and speech. Then, we propose a keyword-oriented multimodal euphemism identification method (KOM-EI), which uses cross-modal feature alignment and dynamic fusion modules to explicitly utilize the visual and audio features of the keywords for efficient euphemism identification. Extensive experiments demonstrate that our method outperforms the SOTA models and LLMs, and show the importance of our multimodal datasets.
Paper Type: long
Research Area: Semantics: Lexical
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources
Languages Studied: English
0 Replies
Loading