Abstract: Highlights•Vision and Language Model (VLM) for video anomaly detection and recognition.•VLM feature space transformation using normality prototype for direction learning.•A Selector model using transformed VLM space for robust abnormal segment selection.•A Temporal model capturing short-term frame relations and long-term dependencies.
Loading