AnomalyCoT: A Multi-Scenario Chain-of-Thought Dataset for Multimodal Large Language Models

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Large Language Models, Industrial Anomaly Detection
Abstract: Industrial Anomaly Detection (IAD) is an indispensable quality control technology in modern production processes. Recently, on account of the outstanding visual comprehension and cross-domain knowledge transfer capabilities of multimodal large language models (MLLMs), existing studies have explored the application of MLLMs in the IAD domain and established some multimodal IAD datasets. However, although the latest datasets contain various fundamental IAD tasks, they formulate tasks in a general question-and-answer format lacking a rigorous reasoning process, and they are relatively limited in the diversity of scenarios, which restricts their reliability in practical applications. In this paper, we propose AnomalyCoT, a multimodal Chain-of-Thought (CoT) dataset for multi-scenario IAD tasks. It consists of 37,565 IAD samples with the CoT data and is defined by challenging composite IAD tasks. Meanwhile, the CoT data for each sample provides precise coordinates of anomaly regions, thereby improving visual comprehension of defects across different types. AnomalyCoT is constructed through a systematic pipeline and involves multiple manual operations. Based on AnomalyCoT, we conducted a comprehensive evaluation of various mainstream MLLMs and fine-tuned representative models in different ways. The final results show that Gemini-2.0-flash achieved the best performance in the direct evaluation with an accuracy rate of 59.6\%, while Llama 3.2-Vision achieves the best performance after LoRA fine-tuning with an accuracy rate of 94.0\%. Among all the fine-tuned models, the average accuracy improvement reaches 36.5\%, demonstrating the potential of integrating CoT datasets in future applications within the IAD field. The code and data are available at \url{https://github.com/Zhaolutuan/AnomalyCoT}.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/zhaolutuan/AnomalyCoT
Code URL: https://github.com/Zhaolutuan/AnomalyCoT
Supplementary Material: pdf
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Submission Number: 2342
Loading