FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

TMLR Paper5752 Authors

27 Aug 2025 (modified: 26 Nov 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Accurate protein representations that integrate sequence and three-dimensional (3D) structure are critical to many biological and biomedical tasks. Most existing models either ignore structure or combine it with sequence through a single, static fusion step. Here we present FusionProt, a unified model that learns representations via iterative, bidirectional fusion between a protein language model and a structure encoder. A single learnable token serves as a carrier, alternating between sequence attention and spatial message passing across layers. FusionProt is evaluated on Enzyme Commission (EC), Gene Ontology (GO), and mutation stability prediction tasks. It improves F\textsubscript{max} by a median of $+1.3$ points (up to $+2.0$) across EC and GO benchmarks, and boosts AUROC by $+3.6$ points over the strongest baseline on mutation stability. Inference cost remains practical, with only $\sim2\text{--}5\%$ runtime overhead. Beyond state-of-the-art performance, we further demonstrate FusionProt’s practical relevance through representative biological case studies, suggesting that the model captures biologically relevant features.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Final camera-ready revision

Code: https://github.com/kalifadan/FusionProt

Assigned Action Editor: ~Wei_Liu3

Submission Number: 5752

Loading