A unified multimodal classification framework based on deep metric learning

Published: 01 Jan 2025, Last Modified: 19 May 2025Neural Networks 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A unified multimodal classification framework that can handle various multimodal classification tasks.•Flexibly process data from multiple modalities, including images, texts, audio, and videos.•Metric-based triplet learning to extract intra-modal relationships in every modality.•Contrastive pairwise learning to capture inter-modal relationships across multiple modalities.
Loading