Abstract: Video anomaly detection (VAD) identifies suspicious events in videos, which is critical for crime prevention and homeland security. In this paper, we propose a simple but highly effective VAD method that relies on attribute-based representations. The base version of our method represents every object by its velocity and pose, and computes anomaly scores by density estimation. Surprisingly, this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the most commonly used VAD dataset. Combining our attribute-based representations with an off-the-shelf, pretrained deep representation yields state-of-the-art performance with a $99.1\%, 93.7\%$, and $85.9\%$ AUROC on Ped2, Avenue, and ShanghaiTech, respectively.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=BZxJGsDKr5
Changes Since Last Submission: We added a section focused on ethical considerations.
Assigned Action Editor: ~Jinwoo_Shin1
Submission Number: 2959
Loading