Abstract: The automatic detection of deaths of users of social networking sites provides a step towards the creation and adoption of an international standard for transferring digital estates to the next-of-kin of Internet users who die a sudden death. In this work, we develop a natural language processing (NLP)-based method for detecting deaths from posts and comments of concerned followers associated with user profiles. We analysed the differences between linguistic characteristics and practices in pre- and post-mortem contents, and developed text classifiers that achieved satisfactory performance in detecting deaths from the online posts. A new corpus was developed by leveraging data from Wikidata and Twitter. Machine learning models, both traditional (RF, KNN, LR and SVM) and deep learning (BiLSTM, CNN and BERT) were trained on features extracted using a variety of techniques: TF-IDF and pre-trained embeddings (Glove, Word2Vec and Fasttext) to classify pre- and post-mortem contents. The results obtained showed that BERT model outperformed all other models. Analysing the linguistic characteristics and practices showed, not surprisingly, that feelings that suggest negativity are dominant in post-mortem tweets and feelings that suggest positivity are dominant in pre-mortem tweets. It was also found that the number of words, personal pronouns, verbs, and family, religious, death, and swear words are higher in post-mortem tweets, whereas, the number of impersonal pronouns and informal words are higher in pre-mortem tweets.
External IDs:doi:10.1007/978-3-031-04447-2_16
Loading