EpiESM-GA: Resource-Efficient Protein Foundation Model Features for Equitable B-Cell Epitope Prediction

Published: 13 Jun 2026, Last Modified: 13 Jun 2026FSG 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: B-cell Epitope Prediction; Machine Learning; Protein Foundation Models; ESM-2 Embeddings; Evolutionary Feature Selection;
Abstract: Prediction of B-cell epitopes can assist in reducing costly wet-lab screening in vaccine design, diagnostics, and antibody discovery. However, current predictors often suffer from noisy labels, weak generalization, and structure-dependent workflows. Here we present EPIESM-GA, an efficient sequence-only pipeline for linear B-cell epitope prediction. Positive and negative peptide examples are collected from IEDB, which provides experimentally tested epitopes and distinguishes positive and negative epitope records based on assay evidence (Vita et al., 2019). Each peptide is encoded with a frozen ESM-2 protein language model: a bidirectional transformer producing amino acid embeddings for downstream structure and function tasks (Lin et al., 2023). Mean-pooled embeddings are further compressed into a compact 420-feature representation with a genetic algorithm and classified with lightweight Random Forest, XGBoost, or MLP heads. This avoids foundation-model fine-tuning, reduces the number of trainable parameters, improves interpretability, and enables low-resource deployment. On an IEDB-derived benchmark, EpiESM-GA attains 0.880 ± 0.004 AUC-ROC, 0.852 ± 0.005 PR-AUC, 82.0 ± 0.6% accuracy, 0.79 ± 0.01 F1, and 0.74 ± 0.01 MCC, outperforming dense ESM-2 features and baselines LBCE-XGB, EpitopeVec, and BepiPred-2.0 (mean ± std over five independent random seeds). The framework shows how frozen protein foundation models can enable pandemic preparedness, peptide vaccine prioritization, diagnostic antigen screening, and equitable computational immunology.
Paper Type: Long Paper
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 11
Loading