Keywords: Antibiotic Resistance, Proteins, Graph Neural Networks, AlphaFold, ESM, Language Models
TL;DR: We propose a novel deep learning method based on LMs and GNNs to classify Antibiotic Resistance using the protein sequence and structure.
Abstract: Antibiotics are traditionally used to treat bacterial infections. However, bacteria can develop immunity to drugs, making them ineffective and thus posing a serious threat to global health. Identifying and classifying the genes responsible for this resistance is critical for the prevention, diagnosis, and treatment of infections as well as the understanding of its mechanisms. Previous methods developed for this purpose have mostly been sequence-based, relying on comparisons to existing databases or machine learning models trained on sequence features. However, genes with comparable functions may not always have similar sequences. As a result, in this paper, we develop a deep learning model that uses the protein structure as a complement to the sequence to classify novel ARGs (antibiotic resistant genes), which we expect to provide more useful information than the sequence alone. The proposed approach consists of two steps. First, we capitalize on the celebrated AlphaFold model to predict the 3D structure of a protein from its amino acid sequence. Then, we process the sequence using a transformers-based language model while we also apply a graph neural network to the graph extracted from the structure. We evaluate the proposed architecture on a standard benchmark dataset where it outperforms state-of-the-art methods.