Enhancing Multimodal Zero-Calibration RSVP-Based Target Detection With Cross-Subject Mixing and Cross-Modal Alignment

Jiayu Mao, Wei Wei, Xujin Li, Shuang Qiu, Huiguang He

Published: 15 Oct 2025, Last Modified: 25 Nov 2025IEEE Sensors JournalEveryoneRevisionsCC BY-SA 4.0
Abstract: Rapid serial visual presentation (RSVP)-based brain–computer interface (BCI) can efficiently detect rare target images by capturing event-related potentials (ERPs) evoked by target images in electroencephalography (EEG) signals. RSVP-based BCI systems require subject-specific data collection to train the decoding model for one new user. Their performances depend on the amount of each new subject’s data for training a model, which limits practical application. Thus, it is necessary to develop the zero-calibration method in RSVP-based BCIs, requiring only historical collected data to train models for a new user. In this work, we introduce the pupil modality and develop a multimodal method combining the EEG and pupil signals to enhance zero-calibration RSVP-based target detection performance. The proposed ResVPNet, based on the residual network tailored for RSVP signal decoding, extracts signal features for each modality with a two-branch structure. A MixSub strategy is designed to implicitly generate new signal samples by mixing subject-related information between subjects to enhance the diversity of training distributions. Also, a prototype-based cross-modal alignment (PCMA) module is employed to align multimodal signal features and separate different class samples. Our proposed network achieved a balanced accuracy (BA) of 91.89%, surpassing the performance of the compared methods. The ablation studies and visualizations revealed the effectiveness of the proposed modules. This work provides an effective method for the zero-calibration RSVP-based target detection system.
Loading