Video Violence Rating: A Large-Scale Public Database and A Multimodal Rating Model

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Multim. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recognizing violence in videos is significant for the automatic identification and assessment of violence content to restrict the access to violence for specific audiences such as children. Existing methods focus on violence detection, which is only able to recognize whether there exists violence or not. Differently, this paper handles the problem of video violence rating, which provides a more granular classification of violence levels. However, there is no publicly available database for video violence rating since it asks for fine-grained violence level annotations. Therefore, this paper introduces a large-scale violence rating database, which will be publicly released. Furthermore, we propose a multimodal violence rating model. Different from existing models, our model makes use of the token-based interaction and contrastive learning techniques. The token-based interaction is able to strengthen the feature representations and make full use of multimodal features. The contrastive learning can improve the performance of the model. To evaluate our model, a wide range of experiments are conducted, and experiment results show that our model outperforms existing methods.
Loading