Demystify the Secret Function in Protein Sequence via Conditional Diffusion Models

Published: 04 Mar 2024, Last Modified: 29 Apr 2024GEM PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Machine learning: computational method and/or computational results
Keywords: Diffusion, Protein
Abstract: Generating accurate functional annotations for protein sequences presents a significant challenge, especially when dealing with lengthy captions that contain concise descriptions. Recent advancements in diffusion models have shown impressive empirical performance in sequence-to-sequence generation tasks. In this paper, we propose ProCDM, a conditional diffusion generative model that utilizes protein sequence representations to generate functional descriptions for proteins. ProCDM employs a contrastive learning framework to extract and align protein embeddings with their functionality and then generates functional descriptions by denoising within the continuous embedding space. Our approach, ProCDM, demonstrates the capability to generate a wide range of functional descriptions for proteins that align with their actual functionality. Comprehensive experiments are conducted on the EC-Caption datasets to evaluate the effectiveness of our proposal.
Submission Number: 65
Loading