Keywords: Variant Effect Prediction, Multi-modal Foundation Models, Protein Language Models, DNA Language Models, Mechanistic Synergy
Abstract: While Protein Language Models (PLMs) have advanced Variant Effect Prediction (VEP), they can sometimes overlook the complex physical and regulatory contexts of the cell. To address this limitation, we propose a parameter-efficient multi-modal foundation model architecture that integrates DNA signals from a DNA Language Model (DLM) with protein representations from a PLM. Through a systematic analysis across 29 activity-based Deep Mutational Scanning (DMS) datasets from ProteinGym, we demonstrate that the nucleotide modality provides more than a simple ensemble gain; it specifically corrects PLM failure modes localized to charged residues. Ultimately, this work highlights that moving beyond unimodal representations is essential for capturing the mechanistic complexity of biological systems and for accurately identifying experimentally validated functional hotspots.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 26
Loading