Abstract: We focus on the issue of public opinion analysis on microvideo platforms. Opinion analysis already has mature models in sociological research, but the rapid development of microvideo platforms makes some of them no longer applicable. Faced with the volume of information on the Internet several times that of traditional media, using traditional models of opinion analysis is no longer possible. In this article, we build a DMVER dataset to address opinion analysis using deep learning approaches. We transform the problem into an affective video content analysis by mapping the relationship between opinion and emotion, and build a new benchmark of microvideos for emotion recognition. The dataset includes all 31 761 samples from more than 30 000 users, with three emotion tendency labels, which are negative, neutral, and positive. Each video instance from our dataset lasts around 15 s and is taken from different Douyin videos. We aim to cover large data samples with multidimensional information to enable different traditional and deep learning methods. The article not only describes in detail the process of defining the dataset labels, the data collection process, and the statistics, but also provides the definition of the opinion-emotion task and the performance of several baselines w.r.t. video recognition frameworks. These baseline models demonstrate the feasibility of the approach proposed in this article and define a more efficient baseline using multimodal models.
External IDs:doi:10.1109/tcss.2025.3608049
Loading