You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language models, Entropy minimization, Test-time adaptation
TL;DR: We introduce Synergistic Test-time Adaptation (SYTTA), a label-free framework for adapting autoregressive language models at inference.
Abstract: Large language models (LLMs) are increasingly deployed in specialized domains such as finance, medicine, and agriculture, where they face significant distribution shifts from their training data. Domain-specific fine-tuning can mitigate this challenge but relies on high-quality labeled data that is expensive and slow to collect in expertise-limited settings. We study label-free test-time adaptation for language models and present SyTTA, an inference-time framework that adapts models on-the-fly without additional supervision. SyTTA couples two complementary uncertainty signals that arise under distribution shift: input-side perplexity, indicating mismatch with domain-specific terminology and patterns, and output-side predictive entropy, indicating diffuse and unstable token probabilities during generation. Unlike prior test-time approaches for LLMs that optimize a single signal, SyTTA integrates both within a unified self-supervised objective that automatically balances their influence, stabilizing generation while improving domain awareness. Across diverse model architectures and domain-specific benchmarks, SyTTA delivers consistent gains. Notably, on agricultural question answering, SyTTA improves ROUGE-Lsum by over 120% on Qwen-2.5-7B with only 4 extra tokens per query. These results show that effective test-time adaptation for language models is achievable without labeled examples, supporting deployment in label-scarce domains. The code will be made available upon acceptance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 18601
Loading