Protein language models expose viral mimicry and immune escape

Published: 17 Jun 2024, Last Modified: 16 Jul 2024ML4LMS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine learning, Viruses, Protein Language models, deep learning, explainable AI, immune escape
TL;DR: Protein language models to can differentiate between human and viral proteins with high accuracy, uncovering patterns of viral mimicry and immune escape, with high interpretability using a multimodal approach.
Abstract: Viruses elude the immune system through molecular mimicry, adopting their hosts biophysical characteristics. We adapt protein language models (PLMs) to differenti-ate between human and viral proteins. Understanding where the immune system and our models make mistakes could reveal viral immune escape mechanisms. We ap-plied pretrained PLMs to predict viral from human pro-teins. achieving a state-of-the-art results (99.7% ROCAUC). We use interpretable models to characterize viral escapers. Altogether, mistakes account for 3.9% of the sequences with viral proteins being disproportionally misclassified. Errors often involve proteins with low im-munogenic potential, human specific viruses, and reverse transcriptases. Viral families causing chronic infections and immune evasion are further enriched. Biological and ML models make similar mistakes. Integrating PLMs with explainable AI, we provide novel insights into viral im-mune escape mechanisms, enhancing strategies for vac-cine development and antiviral research.
Supplementary Material: zip
Poster: pdf
Submission Number: 116
Loading