MILD: A Multimodal Biometric Recognition Framework Integrating Large Foundation Models

Huimin Lu, Qingxin Zhao, Zexing Zhang, Songzhe Ma, Chenglin Lin

Published: 01 Jan 2024, Last Modified: 16 May 2025CCBR (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Traditional unimodal biometric recognition technologies, wh-ile widely applied across various fields, still face limitations such as environmental interference, spoofing attacks, and individual differences, leading to insufficient accuracy and reliability. Consequently, multimodal biometric recognition technology enhances recognition performance by integrating multiple biometric features. However, effectively merging the semantic information of different modalities remains a key challenge. This paper proposes a multimodal biometric recognition framework with integrated large models (MILD). The framework incorporates foundational large models for audio, language, and images, and innovatively designs modality adapters and multimodal decoders to address the semantic alignment issue of large models. Additionally, MILD uniquely combines voiceprints, electrocardiograms (ECG), and palm prints to enhance the anti-spoofing performance of biometric recognition. Experimental results validate the effectiveness of the MILD framework in cross-modal feature fusion and accurate recognition, demonstrating the potential of foundational large models in complex scenarios, with the highest cross-dataset recognition accuracy reaching 97.65%.