Slang or Not? Exploring NLP Techniques for Slang Detection Using the SlangTrack Dataset

ACL ARR 2024 December Submission1763 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The widespread use of casual language, including slang, poses significant difficulties for natural language processing (NLP) systems, particularly in automatically recognising varied word uses. Although previous research has addressed slang through the creation of dictionaries, sentiment analysis, word formation, and interpretation, there has been limited focus on the fundamental issue of detecting slang. This paper focuses on the detection of slang within natural English sentences. To comprehensively tackle this problem, we constructed a novel dataset that includes words commonly used in both slang and non-slang contexts. The dataset comprises target words that display at least one slang sense as well as one non-slang sense; each sentence has been manually annotated as either slang or non-slang, achieving high inter-annotator agreement. Additionally, we sought to identify the most effective approach for addressing this issue. To achieve this, we compared and evaluated different approaches, including (1) traditional machine learning-based models (ML), (2) deep learning-based models (DL) with both contextual and static embeddings, (3) fine-tuning various language models (LMs), and (4) fine-tuning different large language models (LLMs). Our results show that fine-tuning language models, particularly BERT-large-uncased, achieved the highest performance, with an F1-score of 69% for slang and 92% for non-slang, a macro-averaged F1-score of 80%, a weighted-averaged F1-score of 87%, and an overall accuracy of 87%.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Slang detection, Text classification, Annotated corpus.
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1763
Loading