Sliced Vision Transformers for Fine-Grained Anomaly Detection and Localization

Nasar Iqbal, Mattia Zanier, Marco Vernier, Christian Micheloni, Niki Martinel

Published: 2025, Last Modified: 28 Jan 2026AVSS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Accurate detection of small and narrow-shaped defects in industrial imaging is crucial for precisely identifying and localizing anomalies. Vision Transformer (ViT)-based image anomaly detection and localization networks have exhibited remarkable performance improvements in recent years, but their conventional square patch embedding may not be optimal for fine-grained anomaly detection, where anomalies typically exhibit elongated or irregular shapes. To address this problem, we introduce a novel approach that leverages sliced-shaped patches instead of conventional square patches in Vision Transformer (ViT). This approach improves the spatial resolution and ensures more detailed feature representations. State-of-the-art results on two existing industrial anomaly detection benchmarks show that our model effectively captures morphological details and spatial dependencies, thus demonstrating its ability to capture intricate anomaly patterns.

External IDs:dblp:conf/avss/IqbalZVMM25