Abstract: In recent years several applications, namely in surveillance, human-computer interaction and video recovery based on its content has studied the detection and recognition of violence [22]. The purpose of violence detection is to automatically and effectively determine whether or not violence occurs in a short time. So, it is a crucial area since it will automatically enable the necessary means to stop the violence. To quickly solve this problem, we used models trained to solve general activity recognition problems such as Kinetics-400 to learn to extract general patterns that are very important to detect violent behaviour in videos. Our approach consists of using a state of the art pre-trained model in general activity recognition tasks (e.g. Kinetics-400) and then fine-tuning it to violence detection. We applied this approach in two violence datasets and achieved state-of-the-art results using only four input frames.
Loading