Keywords: Protein language models, Direct Preference Optimization, scalability, protein engineering, sequence design, in-vitro validation, clustering, masked language models
TL;DR: We introduce g-DPO, a scalable DPO framework for protein language models that prunes redundant sequence pairs and amortizes likelihood computations, matching DPO performance with observed convergence up to 3.7x faster.
Abstract: Direct Preference Optimization (DPO) is an effective approach for aligning protein language models with experimental design goals. However, DPO faces a scalability bottleneck: the number of possible training pairs grows quadratically with the number of labeled sequences, leading to prohibitive training times even for modestly sized datasets. We introduce g-DPO, a framework that (i) uses sequence space clustering to prune redundant pairs while preserving training signal, and (ii) amortizes likelihood computations with group-based approximations. Across three protein engineering tasks, g-DPO maintains _in-silico_ and _in-vitro_ performance that is statistically indistinguishable from standard DPO, while converging 1.8–3.7x faster, with larger gains expected as dataset size increases.
Submission Number: 50
Loading