Keywords: Anomaly detection, Industrial defect inspection, GRPO
TL;DR: We proposed a new RL method and benchmark for fine-grained anomaly detection
Abstract: Recent research in industrial anomaly detection (IAD) has shifted beyond binary classification and segmentation, increasingly focusing on process-level, interpretable reasoning about the type and cause of anomalies. While multimodal
large language models (MLLMs) have enabled this reformulation through visual
question answering, current anomaly detection methods still suffer from two major limitations: the limited capacity of reward functions to capture intricate complexities and the reliance on generating supervised fine-tuning (SFT) data. Hence,
we propose FAD-TQ, a lightweight reinforcement learning framework for finegrained anomaly detection with thinking quality. Built upon the Group Policy
Gradient paradigm, it eliminates the reference model and KL regularization to reduce rollout overhead and directly optimize the original reinforcement learning
objective. To enable fine-grained guidance over the reasoning process, we design a thinking quality reward composed of two components: an efficiency reward
that penalizes redundant reasoning, and a relevance reward that encourages taskaligned, coherent thought trajectories. Furthermore, we introduce MVTec-LOCOAD-Pair3C, a principled evaluation protocol built on the existing dataset. By
defining three decision types—normal, structural anomaly, and logical anomaly,
rather than binary classification. Extensive experiments demonstrate that FAD-TQ
improves interpretability, accuracy, streamlined reasoning and training efficiency
with reduced computational costs. It demonstrates the potential of using smallscale benchmarks to evaluate MLLM capabilities in IAD. We hope this framework
and evaluation protocol can serve as an example for future research on processlevel reasoning in anomaly detection.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17108
Loading