Abstract: Accurate recognition of potential threats in baggage is critical for public safety. In X-ray scans, objects are often stacked randomly and exhibit a translucent state. This makes the appearance of the prohibited items to be overlapped by a large amount of background clutter, which gives rise to severe class-agnostic feature entanglement and weakens their discriminative properties. Existing frameworks deal with this challenge by directly applying a foreground attention map or a background deattention map to entangled features. However, this setting suffers from two limitations: 1) it is extremely difficult to precisely distinguish foreground or background in entangled latent space and 2) they overlook the intrinsic differences that overlapping and occlusion bring to feature learning. To trounce these limitations, we contribute a novel foreground-background-specific feature learning framework (dubbed ForkNet) for overlapped prohibited item detection. Specifically, ForkNet first employs two individual backbone models to extract foreground features and background features, by inserting task-specific heads at the end of them. After that, a feature disentanglement module is devised to proactively remove redundant background information hidden within foreground features by analyzing their similarity along channel and spatial axes. Finally, feed the refined foreground features into a regular prediction head for object detection. Extensive experiments on five challenging datasets (SIXray, OPIXray, PIXray, PIDray, and CLCXray) show that the proposed framework significantly outperforms the state-of-the-arts (achieving 90.0% mAP, 90.9% mAP, 73.8% mAP, 66.8% mAP, and 62.0% mAP, respectively) while running at a high speed of 6.4 FPS.
Loading