Lightweight Alignment of Unimodal Foundation Models for Metabolite Identification
Keywords: Multimodal, Metabolite Identification, MS/MS, Alignement, Fondation Model
TL;DR: We align pretrained molecular and spectra transformers in a shared space to bypass scarce paired data, achieving state-of-the-art metabolite identification.
Abstract: A central challenge in building multimodal foundation models for the life sciences is the imbalance between abundant unimodal data and scarce paired observations, which limits the scalability of joint multimodal pretraining. We investigate an alternative approach based on aligning pretrained unimodal models. Focusing on metabolite identification, we introduce MSAlign, which maps a molecular transformer (ChemBERTa) and a mass spectra transformer (DreaMS) into a shared embedding space. Despite its simplicity, MSAlign substantially outperforms prior methods across benchmarks, setting a new state-of-the-art in retrieval performance. These results suggest that aligning unimodal foundation models offers an effective route to multimodal learning in biological settings where paired data remain limited.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 37
Loading