Abstract: Predictions and analysis on popularity of usercreated web content, especially video, is becoming increasingly
important and valuable to gain insights in web content’s dissemination in a dynamic distribution system, to benefit
decision making in online marketing and designing of web content. In this paper, we aim to conduct a comprehensive
data-driven study of influential factors of YouTube channels’ popularity. Analysis in this paper is achieved with the
following steps: (1) Collecting related information from various sources in regard to each individual YouTube channel;
(2) Data preprocessing algorithms to extract useful features from unstructured raw data; (3) Training and validating
machine learning models for prediction of quantified channel popularity and inference of relative importance of predictive
features; (4) Developing an item based recommender based on previous analysis and its online visualization. With data of
more than 10,000 YouTube channels and 80,000 YouTube videos, our analysis shows that popularity of current YouTube
channels can be quantified as 3 clusters with different levels of accumulated views; frequency of publishing videos, interaction
of content creator and reference of its videos on online social media are critical factors to promote popularity of a YouTube
channel. In this paper, we also designed a cascaded Random Forest model that can solve the imbalanced classification
problem in prediction.
Loading