Demystify the Secret Function in Protein Sequence via Conditional Diffusion Models

Yaoyao Xu; Xuxi Chen; Tong Wang; Huan He; Tianlong Chen; Manolis Kellis

Demystify the Secret Function in Protein Sequence via Conditional Diffusion Models

Yaoyao Xu, Xuxi Chen, Tong Wang, Huan He, Tianlong Chen, Manolis Kellis

Published: 04 Mar 2024, Last Modified: 29 Apr 2024GEM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Machine learning: computational method and/or computational results

Keywords: Diffusion, Protein

Abstract: Generating accurate functional annotations for protein sequences presents a significant challenge, especially when dealing with lengthy captions that contain concise descriptions. Recent advancements in diffusion models have shown impressive empirical performance in sequence-to-sequence generation tasks. In this paper, we propose ProCDM, a conditional diffusion generative model that utilizes protein sequence representations to generate functional descriptions for proteins. ProCDM employs a contrastive learning framework to extract and align protein embeddings with their functionality and then generates functional descriptions by denoising within the continuous embedding space. Our approach, ProCDM, demonstrates the capability to generate a wide range of functional descriptions for proteins that align with their actual functionality. Comprehensive experiments are conducted on the EC-Caption datasets to evaluate the effectiveness of our proposal.

Submission Number: 65

Loading