Abstract: Sarcasm detection is challenging in natural language processing since its peculiar linguistic expression. Thanks in part to the availability of considerable annotated resources for some datasets, current supervised learning-based approaches can achieve promising performance in sarcasm detection. In real-world scenarios, annotating data for the peculiar language expression of sarcasm proves challenging. Consequently, recent studies have delved into unsupervised learning approaches for sarcasm detection, seeking to mitigate the labor-intensive process of annotation. In this paper, we present a novel unsupervised sarcasm detection method leveraging abundant unlabeled social media data. Our approach revolves around employing prompts as a cornerstone. Initially, we gathered approximately 3 million texts from Twitter through targeted hashtag-based searches, segregating them into sarcasm and non-sarcasm categories based on associated hashtags. Subsequently, these collected texts undergo training using a pre-trained BERT model, customized for masked language modeling and coined as SarcasmBERT. This step aims to enhance the model’s grasp of sarcastic cues within the text. Finally, we devise prompts tailored for the unlabeled data to execute unsupervised sarcasm detection effectively. Our experimental findings across six benchmark datasets highlight the superiority of our method over state-of-the-art unsupervised baselines. Additionally, the integration of our SarcasmBERT into established BERT-based sarcasm detection methods showcases a direct avenue for enhancing performance, thereby illustrating its potential for immediate and substantial improvements.
Loading