g-DPO: Scalable Preference Optimization for Protein Language Models

Constance Ferragu; Cradle ML Team

g-DPO: Scalable Preference Optimization for Protein Language Models

Constance Ferragu, Cradle ML Team

Published: 06 Oct 2025, Last Modified: 06 Oct 2025NeurIPS 2025 2nd Workshop FM4LS PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein language models, Direct Preference Optimization, scalability, protein engineering, sequence design, in-vitro validation, clustering, masked language models

TL;DR: We introduce g-DPO, a scalable DPO framework for protein language models that prunes redundant sequence pairs and amortizes likelihood computations, matching DPO performance with observed convergence up to 3.7x faster.

Abstract: Direct Preference Optimization (DPO) is an effective approach for aligning protein language models with experimental design goals. However, DPO faces a scalability bottleneck: the number of possible training pairs grows quadratically with the number of labeled sequences, leading to prohibitive training times even for modestly sized datasets. We introduce g-DPO, a framework that (i) uses sequence space clustering to prune redundant pairs while preserving training signal, and (ii) amortizes likelihood computations with group-based approximations. Across three protein engineering tasks, g-DPO maintains _in-silico_ and _in-vitro_ performance that is statistically indistinguishable from standard DPO, while converging 1.8–3.7x faster, with larger gains expected as dataset size increases.

Submission Number: 50

Loading