Abstract: The principal goal of drug design is to find ligand molecules that exhibit affinity to a given target protein. In recent years, deep generative methods have shown their promise in de novo drug design. However, most of these methods design molecules based on target-specific ligand datasets instead of targets’ features and fail to design drugs against novel target proteins that barely have active ligand datasets. A fast and relatively accurate evaluation method is needed to evaluate algorithms capable of generating large numbers of molecules. In this work, we treat target-specific de novo drug design as a sequence-to-sequence generation task and propose a Transformer architecture that compensates for the lack of training data with a BERT pretraining approach to generate protein sequence-conditioned Target Ligand Molecules SMILES. First, we pre-train two self-attention blocks of Transformer on the large-scale amino acid sequence dataset and molecular SMILES dataset, respectively, to capture the feature representation of the target. Then we fine-tune the Transformer’s encoder-decoder mutual attention block on the protein-ligand complex dataset to learn conditional generation using autoregressive supervised learning. The individual results do not demonstrate the effect of the generative algorithm, so we propose to evaluate the model by calculating the affinity distribution of the molecules. We also evaluate our method by designing ligands against three well-studied proteins. Furthermore, our model proposes molecules with binding affinities exceeding certain FDA-approved drugs in docking experiments.
Loading