Open-Vocabulary Prohibited Item Detection for Real-World X-Ray Security Inspection

Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li

Published: 01 Jan 2025, Last Modified: 04 Nov 2025IEEE Transactions on Information Forensics and SecurityEveryoneRevisionsCC BY-SA 4.0

Abstract: Computer-aided prohibited item detection is applied in X-ray security inspection to maintain public safety. However, existing prohibited item detectors are limited to a small set of categories in current X-ray datasets, posing potential risks to public security. Since constructing bigger datasets and annotating hundreds of categories is time-consuming and labor-intensive, scaling detectors to more categories with minimal supervision is of great importance. To this end, in this paper, we adopt an open-vocabulary object detection (OVOD) method to detect arbitrary unlabeled novel categories of prohibited item. OVOD methods typically rely on datasets with caption annotations, which are lacking in the domain of prohibited item detection. To support the research on OVOD in X-ray security inspection scenarios, we contribute PIXray Caption dataset, the first X-ray dataset with image-caption pair annotations, which could benchmark and facilitate researches in the community. Further, we propose a novel Open-Vocabulary Prohibited Item Detection (OVPID) network to leverage textual information from captions. OVPID contains two core modules, i.e., Interference Resistant Module (IRM) and Prediction Module (PM). Specifically, IRM includes two submodules, namely Edge Perception (EP) and Foreground Activation (FA), which are designed to address the dilemma of interference caused by overlapping problem and complex background in X-ray images. PM consists of two branches for classification and localization. In classification branch, PM generates more accurate prompts for X-ray dataset via large multimodal model (LMM). In localization branch, PM aligns the student embeddings with both teacher and caption embeddings. Extensive experiments on PIXray Caption dataset demonstrate that OVPID outperforms other OVOD methods by delivering a higher accuracy on novel categories.

External IDs:doi:10.1109/tifs.2025.3586492