Logits are All We Need to Adapt Closed Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a “Plugin” framework that uses token-logit reweighting at inference to adapt closed-source LLMs to new domains without retraining or accessing model weights.
Abstract: Many commercial Large Language Models (LLMs) are often closed-source, limiting developers to prompt tuning for aligning content generation with specific applications. While these models currently do not provide access to token logits, we argue that if such access were available, it would enable more powerful adaptation techniques beyond prompt engineering. In this paper, we propose a token-level probability reweighting framework that, given access to logits and a small amount of task-specific data, can effectively steer black-box LLMs toward application-specific content generation. Our approach views next-token prediction through the lens of supervised classification. We show that aligning black-box LLMs with task-specific data can be formulated as a label noise correction problem, leading to Plugin model -- an autoregressive probability reweighting model that operates solely on logits. We provide theoretical justification for why reweighting logits alone is sufficient for task adaptation. Extensive experiments with multiple datasets, LLMs, and reweighting models demonstrate the effectiveness of our method, advocating for broader access to token logits in closed-source models. We provide our code at this https URL.
Lay Summary: Today’s headline-grabbing language models and other commercial AIs—are usually sealed tight. Developers only get a single control knob: the prompt. If the model writes off-brand marketing copy or misses legal jargon, you must keep re-phrasing the prompt and hoping for the best. Because the closed source LLMs do no reveal the model’s inner code or training data, deeper tuning feels out of reach. Our research shows there’s a sweet spot between total secrecy and full open source: let developers peek at one number, called the logit, that the model assigns to each possible next word. With just those logits and a handful of example sentences, we introduce Plugin—a light-weight “probability dial” that quietly re-weights the model’s word choices as it writes. No retraining, no knowledge of model or architecture—just a smarter push toward the vocabulary and tone a project really needs. In experiments spanning four very different writing tasks, Plugin made closed-source models noticeably better at using the right terms—boosting domain-specific words, tone, and style. This means businesses could enjoy bespoke AI content without huge compute bills or risking secrets, and model owners could offer a new “logit mode” that keeps their core technology safe. By championing this middle-ground interface, our work opens the door to more controllable, less biased, and widely useful AI.
Link To Code: https://github.com/stair-lab/plugin-llm
Primary Area: Deep Learning->Large Language Models
Keywords: Distribution Shift, Black-box Model, Reweighing, Decoding, Large Language Models
Flagged For Ethics Review: true
Submission Number: 11097
Loading