Token-Wise Residual Latent Adapters: Steering Seq2Seq Models for Protein Fitness Extrapolation

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: direct preference optimization, protein language models, protein fitness extrapolation, latent space adaptation, representation learning, seq2seq, parameter-efficient fine-tuning
TL;DR: A residual MLP adapter with 600× fewer trainable parameters than the full model achieves comparable or superior protein fitness extrapolation to LoRA, end-to-end fine-tuning, and DPO-based alignment.
Abstract: Protein design requires extrapolating beyond training data to achieve higher fitness. State-of-the-art methods typically fine-tune billion-parameter language models end-to-end, often combined with external scorers, data distillation, and multiple rounds of iterative refinement. We introduce a residual latent adapter, a 5M parameter MLP inserted between the encoder and decoder of a frozen ProtT5-3B model, which learns a token-wise residual transformation on encoder embeddings via a simple MSE objective. In a single forward pass with no external scorer, RLA achieves comparable or superior fitness to methods requiring 600$\times$ more trainable parameters and multi-stage pipelines, particularly on the harder extrapolation benchmarks most relevant to practical protein engineering. Our results demonstrate that a compact residual transformation in latent space provides a simple, data-efficient, and compute-efficient approach to protein fitness extrapolation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 51
Loading