Abstract: With the onset of social media and news aggregators on the Web, the newspaper industry is faced with a declining subscriber base. In order to retain customers both on-line and in print, it is therefore critical to predict and mitigate customer churn. Newspapers typically have heterogeneous sources of valuable data: circulation data, customer subscription information, news content, and search click log data. An ensemble of predictive models over multiple sources faces unique challenges -- ascertaining short-term versus long-term effects of features on churn, and determining mutual information properties across multiple data sources. We present TopChurn, a novel system that uses topic models as a means of extracting dominant features from user complaints and Web data for churn prediction. TopChurn uses a maximum entropy-based approach to identify features that are most indicative of subscribers likely to drop subscription within a specified period of time. We conduct temporal analyses to determine long-term versus short-term effects of status changes on subscriber accounts, included in our temporal models of churn; and topic and sentiment analyses on news and clicklogs, included in our Web models of churn. We then validate our insights via experiments over real data from The Columbus Dispatch, a mainstream daily newspaper, and demonstrate that our churn models significantly outperform baselines for various prediction windows.
0 Replies
Loading