Abstract: Social media users often use disease or symptom terms in ways other than describing their health conditions, which can lead to flawed conclusions in data-driven public health surveillance. The health mention classification (HMC) task aims to identify posts in which users use disease or symptom terms to discuss their health conditions instead of using them for other reasons. Existing methods rely on features extracted from external resources and are tested on data from either Twitter or Reddit; therefore, their generalizability and transferability are unproven. In this work, we present MASK-Net, which masks disease or symptom terms and relies on the context of a post. Furthermore, to capture the negative sentiments associated with the experience of having a disease, we incorporate sentiment information to improve the HMC. We conduct experiments using publicly available health-mention datasets collected from Twitter and Reddit. Experimental results demonstrate that our method outperforms state-of-the-art methods on both HMC datasets, highlighting the relevance of context words in identifying HMC. Additionally, we evaluate our method in cross-domain and multidomain settings to analyze the transferability and generalizability of MASK-Net and conclude with a discussion on the empirical and ethical considerations of our study.
External IDs:dblp:journals/tcss/NaseemTZRH25
Loading