Track: Tiny Paper Track
Keywords: Target Fishing, Mulitmodal, representation learning, small molecule, protein, drug, binding affinity
Abstract: Identifying the interactions a small molecule makes with different proteins is an important task in biology and a critical component of drug discovery. Retrieving the list of protein targets for a small molecule, a task often referred to as target fishing in the literature, is especially challenging and important when little is known about the molecule and its biological activity is being explored. Before experimental testing, other methods must be used to narrow the sheer number of protein candidates. Recent machine learning based methods for biological representations have shown strong performance for related modalities and tasks, including joint protein and small molecule representation learning and binding affinity prediction. In this paper, we explore the application of several common protein and small molecule representations and learning methods to the task of target fishing. We develop a novel dataset designed to reflect practical scenarios for target fishing, especially for drug development, and compare the performance of different combinations of multimodal representations and contrastive learning techniques, including molecular docking as a domain specific baseline. We find in our preliminary work that although standard approaches to joint representation learning for proteins and small molecules may work to distinguish protein and small molecule binding affinities, they struggle to order protein targets for small molecules in their latent space and perform poorly on ranking protein targets unseen during training.
Submission Number: 108
Loading