Keywords: Protein sequence generation; Protein sequence design
Abstract: Designing protein sequences with specific biological functions and structural stability is of paramount importance in both biology and chemistry. Generative models have demonstrated their potential for reliable protein design. However, previous models have been constrained by their inability to generate protein sequences in a controlled manner, a capability that is crucial for various biological applications. In this work, we propose TaxDiff, a taxonomic-guided diffusion model for controllable protein sequence generation that combines biological species information with the generative capabilities of diffusion models to generate structurally stable proteins within the sequence space. Specifically, taxonomic control information is inserted into each layer of the transformer block to achieve fine-grained control. The combination of global and local attention ensures the sequence consistency and structural foldability of taxonomic-specific proteins. Extensive experiments demonstrate that TaxDiff can consistently achieve better performance on multiple protein sequence generation benchmarks in both taxonomic-guided controllable generation and unconditional generation. Notably, the sequences generated by TaxDiff even surpass those produced by direct-structure-generation models in terms of confidence based on predicted structures and require only a quarter of the time of models based on the diffusion model.
Submission Number: 11
Loading