PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.

YITING LI; Xulei Yang; Jing Zhang; Sichao Tian; Jingyi Liao; Fayao Liu

PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.

YITING LI, Xulei Yang, Jing Zhang, Sichao Tian, Jingyi Liao, Fayao Liu

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: anomaly detection

Abstract: Unsupervised multimodal anomaly detection (MAD) aims to detect anomalies by using both RGB and 3D modalities. However, existing methods struggle in few-shot scenarios where the number of normal training samples is limited. Specifically, cross-modal alignment approaches fail to learn reliable correspondences from scarce normal data, whereas memory-based methods often misclassify unseen normal variations as anomalies. To address these issues, we propose \mtd, a prototype-driven reconstruction framework equipped with explicit cross-modal knowledge transfer. Instead of relying on dense feature alignment or heavy memory banks, \mtd uses a compact set of learnable prototypes to capture diverse normal patterns and constrain feature reconstruction. Specifically, our framework incorporates three core innovations. We introduce Balanced Prototype Assignment (BPA), which employs balanced optimal transport to ensure uniform prototype utilization and prevent codebook collapse. Next, we propose Adaptive Prototype Refinement (APR), which uses gated prototype updates to dynamically expand the model's knowledge of unseen normal variations during inference. To enable each modality to assist the other in reconstructing, we further develop a Multimodal Normality Communication (MNC) module that exchanges high-level normal cues between modalities via gated cross-attention. Extensive experiments on the MVTec 3D-AD, Eyecandies, and Real-IAD benchmarks validate the effectiveness of \mtd, where it consistently achieves superior performance compared to existing baselines under challenging few-shot settings.

Supplementary Material: pdf

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 8569

Loading