Abstract: Weather-related disruptions have a significant impact on a variety of industries, including agriculture, infrastructure, and public safety. Predicting these unusual weather events remains a significant challenge. The problem is complicated by the lack of high-quality weather data due to the failure of sensors at weather stations during severe weather. In this work, we proposed a novel method to classify rare severe weather-related events by incorporating publicly available tweets with the meteorological conditions readings collected from weather stations across Alaska. The use of multimodal data of varying quality is introduced to compensate for missing meteorological recordings obtained from the weather stations. In our study, we collected geotagged tweets from the region of focus and utilized context-aware BERT embeddings to rigorously analyze and ensure the validity and dependability of the social media texts. Labels for the social sensor data were generated based on weather events associated with the tweet collection. For predicting rare weather events, we proposed a multiclass classification model. This machine learning model was trained and tested using data from the year 2020. The results obtained by learning from the integrated data showed a significant increase in the F1 score when compared to relying on weather data alone. Our findings indicate that a model supplemented with daily weather and social media text data outperforms alternatives enhanced with hourly data. The proposed model achieved an F1-score of 0.83 for multimodal data, compared to 0.30 obtained by the baseline model that relies solely on weather data. Training the proposed model with the combined dataset significantly improved performance, resulting in a 95% accuracy.
Loading