Abstract: Recognizing violent actions is critical for ensuring timely detection and prevention. While numerous machine learning models, particularly deep learning approaches, have been developed to address this issue, many focus solely on performance enhancement, overlooking computational complexity and memory usage. Consequently, these models often feature millions of parameters and billions of operations, posing significant challenges for real-time deployment on embedded and mobile devices. In this paper, we introduce a lightweight and robust model for violence recognition that leverages Mobile Inverted Bottleneck and Mobile Attention blocks. The Mobile Inverted Bottleneck is adapted from the MobileNet v3 architecture, while our proposed Mobile Attention mechanism is inspired by the multi-head self-attention in Transformers. Experimental results demonstrate that our framework achieves superior performance with reduced model size and lower computational cost across various standard datasets.
Loading