STAPS : Training-Free Zero-Shot Anomaly Detection via Semantic-Temporal Scoring and Prototype Selection

JunhoLee; Suk-Ju Kang

STAPS : Training-Free Zero-Shot Anomaly Detection via Semantic-Temporal Scoring and Prototype Selection

JunhoLee, Suk-Ju Kang

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Anomaly detection, zero-shot learning

Abstract: Zero-shot anomaly detection (ZAD) addresses the need for anomaly detection without large-scale labeled datasets by leveraging large pretrained representations without domain-specific supervision. However, existing ZAD methods still depend on labeled pretraining, limiting their applicability in practical scenarios. Training-free ZAD eliminates this dependency by directly leveraging pretrained backbones without additional training, offering a cost-efficient alternative. However, training-free ZAD suffers from semantic bias by applying class-oriented representations to anomaly detection without fine-tuning. In this work, we propose a novel training-free framework Semantic-Temporal scoring and Prototype Selection (STAPS) that mitigates semantic bias and incorporates temporal context into anomaly detection. The proposed method comprises two key components. First, semantic-temporal anomaly scoring refines anomaly scores that are biased toward class semantics by leveraging temporal locality and continuity to capture sequential dependencies. Second, bayesian gaussian mixture-based prototype selection automatically identifies prototypes sensitive to anomaly evidence, thereby reducing semantic bias in backbone features and enhancing pixel-level anomaly segmentation. Extensive experiments on nine benchmark datasets demonstrate that our proposed method achieves state-of-the-art performance, achieving 91.9\% image-level AUROC for anomaly detection and 97.7\% pixel-level AUROC for anomaly segmentation, highlighting both robustness and generalizability.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 15584

Loading