TopoMHC: Sequence–Topology Fusion for MHC Binding

01 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: immunogenicity prediction, major histocompatibility complex, peptide representation learning, statistical topology, persistent homology, protein language models, cross-modal learning, vaccine design
Abstract: Accurate prediction of peptide immunogenicity, particularly the binding affinity to major histocompatibility complex (MHC) molecules, is critical for vaccine design and immunotherapy. Existing approaches are predominantly sequence-based and often overlook structural variability and topological organization, which restricts predictive reliability. In this work, we introduce a multi-modal framework that integrates sequence embeddings from a pre-trained protein language model (e.g., ESM-C) with topology-informed descriptors derived from peptide conformations. We generate peptide conformers using molecular dynamics simulations and RDKit-based methods, and from these conformations we compute persistent homology invariants, Betti numbers, geometric statistics, and residue connectivity measures. These topological features are then fused with sequence embeddings through a cross-attention mechanism, allowing the model to capture both local sequence patterns and global conformational organization. Extensive experiments demonstrate consistent improvements over conventional structure-based and sequence-only baselines, establishing state-of-the-art performance in peptide immunogenicity prediction.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 83
Loading