Abstract: As popularity of video-sharing platforms, content creators have a high demand to produce content which attracts the large amount of viewers. There are many factors related to engagement: visual, sound, transcript, title etc. To take into account of these factors, we propose a deep multi-modal hybrid fusion for YouTube video engagement. Our architecture allows us to be easy to adapt state-of-the-art models for a particular task or variety of modalities, then fuse them to obtain more information aim to classify better. A proposed residual block as a simple neuron architecture search is used to get better features extracted. Our work is at the forefront of classifying YouTube video engagement and promises to broaden the research community’s reach. Through detailed experiments, we proved that the model is the state-of-the-art in problem YouTube video engagement analytics.
External IDs:doi:10.1007/978-3-031-26431-3_5
Loading