SuicidEmoji: Derived Emoji Dataset and Tasks for Suicide-Related Social Content

Tianlin Zhang, Kailai Yang, Shaoxiong Ji, Boyang Liu, Qianqian Xie, Sophia Ananiadou

Published: 2024, Last Modified: 08 Feb 2026SIGIR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Early suicidal ideation detection using social media is crucial for mental health surveillance. Simultaneously, emojis from the posts can help us better understand users' emotions and predict mental health conditions. However, research in emoji-based suicide analysis remains underexplored, with few resources available, which can restrict the development of studying emoji usage patterns among users with suicidal ideation. In this work, we build a derived suicide-related emoji dataset named SuicidEmoji, which contains 25k emoji posts (2,329 suicide-related posts and 22,722 posts for the control group users) filtered from about 1.3 million crawled Reddit data. To the best of our knowledge, SuicidEmoji is the first suicide-related emoji dataset. Based on SuicidEmoji, we propose two novel tasks: emoji-aware suicidal ideation detection and emoji prediction, for which we build two benchmark subdatasets from SuicidEmoji to evaluate the performance of advanced methods including pre-trained language models (PLMs) and large language models (LLMs). We analyze the experimental results of two PLMs and the highly capable LLMs, which reveal the significance and challenges of emoji-based suicide-related NLP tasks. The dataset is avaliable at https://github.com/TianlinZhang668/SuicidEmoji.