Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods
Abstract: Spatial aggregation of Twitter language may make it possible to monitor the subjective well-being of populations on a large scale. Text analysis methods need to yield robust estimates to be dependable. On the one hand, we find that data-driven machine learning-based methods offer accurate and robust measurements of regional well-being across the United States when evaluated against gold-standard Gallup survey measures. On the other hand, we find that standard English word-level methods (such as Linguistic Inquiry and Word Count 2015’s Positive emotion dictionary and Language Assessment by Mechanical Turk) can yield estimates of county well-being inversely correlated with survey estimates, due to regional cultural and socioeconomic differences in language use. Some of the most frequent misleading words can be removed to improve the accuracy of these word-level methods.
0 Replies
Loading