PepLand: a large-scale pre-trained peptide representation model for a comprehensive landscape of both canonical and non-canonical amino acids

Ruochi Zhang, Haoran Wu, Chang Liu, Qian Yang, Yuting Xiu, Kewei Li, Ningning Chen, Yu Wang, Yan Wang, Xin Gao, Fengfeng Zhou

Published: 31 Jul 2025, Last Modified: 03 Aug 2025Briefings in BioinformaticsEveryoneCC BY 4.0

Abstract: The recent interest in peptides incorporating non-canonical amino acids has surged within the scientific community, driven by their enhanced stability and resistance to proteolytic degradation. These so-called non-canonical peptides offer significant potential for modifying biological, pharmacological, and physiochemical characteristics in both native and synthetic contexts. Despite their advantages, there remains a notable gap in the availability of an efficient pre-trained model capable of effectively capturing feature representations from such intricate peptide sequences. This study herein introduces PepLand, a novel pre-training framework designed for the comprehensive representation and analysis of peptides, encompassing both canonical and non-canonical amino acids. PepLand leverages a general-purpose multi-view heterogeneous graph neural network to unveil the subtle structural representations of peptides. Our empirical evaluations demonstrate PepLand’s proficiency in a range of peptide property prediction tasks, including cell penetrability, solubility, and protein–peptide binding affinity. These rigorous assessments affirm PepLand’s superior capability in discerning critical representations of peptides with both canonical and non-canonical amino acids, and provide a robust foundation for transformative advances in peptide-focused pharmaceutical research. We have made the entire source code and datasets available at http://www.healthinformaticslab.org/supp/resources.php or https://github.com/zhangruochi/PepLand