SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding

Thomas Walton; Darin Tsui; Aryan Musharaf; Amirali Aghazadeh

SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding

Thomas Walton, Darin Tsui, Aryan Musharaf, Amirali Aghazadeh

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: speculative decoding, protein design, autoregressive, PLM, sampling, structure

TL;DR: We develop a novel speculative decoding framework for protein generation, using structure aware guidance from k-mers to generate proteins with higher likelihood and structural confidence.

Abstract: Autoregressive models have transformed protein engineering by enabling the generation of novel protein sequences beyond those found in nature. However, their sequential inference introduces significant latency, limiting their utility in high-throughput protein screening. Speculative decoding accelerates generation by employing a lightweight draft model to sample tokens, which a larger target model then verifies and refines. Yet in protein sequence generation, draft models are typically agnostic to the structural and functional constraints of the target protein, leading to biologically implausible outputs and a shift in the likelihood distribution of generated sequences. We introduce SpecMER (Speculative Decoding via k-mer Guidance), a novel framework that incorporates biological, structural, and functional priors using k-mer motifs extracted from multiple sequence alignments. By scoring candidate sequences in parallel and selecting those most consistent with known biological patterns, SpecMER significantly improves sequence plausibility while retaining the efficiency of speculative decoding. SpecMER achieves 24–32% speedup over standard autoregressive decoding, along with higher acceptance rates and improved sequence likelihoods.

Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)

Submission Number: 12859

Loading