Abstract: Euphemisms are widely used on social media and darknet markets to evade supervision. For instance, "ice" serves as a euphemism for the target keyword "methamphetamine" in illicit transactions. Thus, euphemism identification which aims to map the euphemism to its secret meaning (target keyword) is a crucial task in ensuring social network security. However, this task poses significant challenges, including resource limitations due to the unavailable of annotated datasets and linguistic challenges arising from subtle differences in meaning between target keywords. Existing methods employed self-supervised schemes to automatically construct labeled training data, addressing the resource limitations. Yet, these methods rely on static embedding methods that fail to distinguish between target keywords with similar meanings. In addition, we observe that different euphemisms in similar contexts confuse the identification results. To overcome these obstacles, we propose a feature fusion and individualization (FFI) method for euphemism identification. First, we reformulate the task as a cloze task, making it more feasible. Next, we develop a feature fusion module to capture both dynamic global and static local features, enhancing discrimination between different euphemisms in similar contexts. Additionally, we employ a feature individualization module to ensure each target keyword has a unique feature representation by projecting features into their orthogonal space. As a result, FFI can effectively identify similar euphemisms that refer to target keywords with similar meanings. Experimental results demonstrate that our method outperforms state-of-the-art methods and large language models, providing robust support for its effectiveness.
Loading