Variant effect predictors (VEPs) are designed to predict the impact of protein variants on cellular function, traditionally using data from multiple sequence alignments (MSAs). This assumes that natural variants are fit, a premise challenged by pharmacogenomics, where some pharmacogenes have low evolutionary pressure. In this context, deep mutational scanning (DMS) datasets are of particular interest since they provide quantitative fitness scores for variants. In this work, we propose a transformer-based matrix variational auto-encoder architecture and evaluate its performances on $33$ DMS datasets corresponding to $26$ drug target and absorption-distribution-metabolism-excretion (ADME) proteins available in the ProteinGym benchmark. Our model trained on MSAs (matVAE-MSA) outperforms a model similar to the widely used VEPs in pharmacogenomics, and sets a new zero-shot prediction benchmark for $2$ proteins related to the Noonan syndrome. We compare matVAE-MSA with matENC-DMS, a model with similar capacity, but trained on DMS data in a 5-fold supervised cross-validation framework. matENC-DMS outperforms matVAE-MSA for $15$ out of $33$ DMS datasets, including all ADME, and certain drug target proteins. Although our models do not outperform the best baseline models, our results help shed new light on the role of evolutionary pressure for the validity of the premise of VEP design. In turn motivating the development of DMS datasets to improve VEPs on pharmacogene-related proteins.
Keywords: variant effect prediction, variational auto-encoder, transformer, deep learning
Abstract:
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2159
Loading