Parameter-Efficient Adaptation of a Pretrained Language Model via Soft Prompt Tuning Enables Hit-Enriched Conditional Cell Line-Specific Generation of Cell-Penetrating Peptides

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: PEFT, protein language model, active sampling
TL;DR: We propose a conditional generative framework for cell line–specific CPP design by parameter-efficient fine-tuning of a pretrained protein language model
Abstract: Cell-penetrating peptides (CPPs) offer a promising route for intracellular delivery, yet their design remains constrained by scarce, heterogeneous, and weakly standardized experimental data. Existing computational work has largely focused on supervised modeling, while the more challenging problem of generating functional, cell line–specific CPPs under limited supervision remains underexplored. Here, we introduce a conditional generative framework for cell line–aware CPP design by fine-tuning a pretrained protein language model (pLM) using parameter-efficient adaptation strategies. To support reliable post-generation selection, we developed independent predictive model for CPP activity, including a calibrated classifier achieving F1 = 0.847 and precision = 0.952. We further enforced strict dataset separation and active sampling to mitigate information leakage and improve robustness in low-data regimes. Among the evaluated conditioning strategies, soft-prompt tuning provided the best balance between functional enrichment, diversity, and sequence-level novelty, while maintaining favorable diversity and similarity constraints. Furthermore, external validation with PMIPred was conducted, which showed strong enrichment toward membrane-active peptides. Overall, our results demonstrate that conditional generation of CPPs is feasible even under stringent data limitations, and provide a scalable blueprint for moving beyond predictive peptide modeling toward controllable, function-oriented molecular design.
Submission Number: 327
Loading