Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

Girish Keshav Palshikar, Manoj Apte, Deepak Pandita

Published: 01 Jan 2018, Last Modified: 28 Mar 2024Inf. Syst. Frontiers 2018Readers: Everyone

Abstract: Social media has quickly established itself as an important means that people, NGOs and governments use to spread information during natural or man-made disasters, mass emergencies and crisis situations. Given this important role, real-time analysis of social media contents to locate, organize and use valuable information for disaster management is crucial. In this paper, we propose self-learning algorithms that, with minimal supervision, construct a simple bag-of-words model of information expressed in the news about various natural disasters. Such a model is human-understandable, human-modifiable and usable in a real-time scenario. Since tweets are a different category of documents than news, we next propose a model transfer algorithm, which essentially refines the model learned from news by analyzing a large unlabeled corpus of tweets. We show empirically that model transfer improves the predictive accuracy of the model. We demonstrate empirically that our model learning algorithm is better than several state of the art semi-supervised learning algorithms. Finally, we present an online algorithm that learns the weights for words in the model and demonstrate the efficacy of the model with word weights.

0 Replies