CPgen: Heterochiral Cyclic Peptide Ensemble Generation and Ensemble-Based Sequence Design

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: flow matching, classifier-free guidance, cyclic peptides, conformer ensembles, generative models, drug design
TL;DR: CPgen is a multi-modal partially latent flow matching model for cyclic peptide ensemble generation and ensemble-based sequence design.
Abstract: We present CPgen, which adapts the La-Proteina partially latent flow matching framework for heterochiral cyclic peptide conformer ensemble generation and ensemble-based sequence design. A variational autoencoder compresses all-atom structures into a per-residue latent space, and a flow matching denoiser generates in the joint Cα + latent variable space, conditioned on amino acid sequence, Cα coordinates, or per-residue ϕ,ψ- backbone distribution via multi-modal classifier-free guidance. CPgen is trained on ∼7M Rosetta-sampled conformers spanning ∼7,000 heterochiral sequences containing 39 amino acid types (19 L-amino + 19 D-amino + Glycine). The model achieves: (1) sequence-conditioned ensemble generation with ϕ,ψ-backbone distribution similarity to the target distribution; (2) ensemble- based sequence design where conditioning on ϕ,ψ-backbone distributions recovers ∼90% of residue identities (100% for 12-mers), establishing that conformational ensemble distributions can be used for sequence inference; and (3) native D-amino acid support, with correct mirror-image backbone preferences emerging from model training without explicit chirality-aware design choices.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 163
Loading