Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection

Mohammad Mahdi Moradi; Hossam Amer; Sudhir Mudur; Weiwei Zhang; Yang Liu; Walid Ahmed

Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection

Mohammad Mahdi Moradi, Hossam Amer, Sudhir Mudur, Weiwei Zhang, Yang Liu, Walid Ahmed

Published: 23 Sept 2025, Last Modified: 11 Nov 2025CCFM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: verifier, Distribution Shift, Test Time Training

Abstract: Adapting pretrained LLMs to unlabeled, out-of-distribution data remains challenging, especially for structurally novel reasoning tasks. We present VDS-TTT (Verifier-Driven Sample Selection for Test-Time Training), a self-supervised framework that uses a learned verifier to score multiple generated responses and select only high-confidence pseudo-labeled examples for on-the-fly adaptation. For each query, the LLM generates N answers; the verifier picks the most reliable one above a confidence threshold, paired with its query for fine-tuning. We update only low-rank LoRA adapters, enabling efficient and fast adaptation. Across three benchmarks and three state-of-the-art LLMs, VDS-TTT achieves up to 32.29% relative improvement over the base model, showing its effectiveness for continuous test-time self-improvement.

Serve As Reviewer: ~Walid_Ahmed1

Submission Number: 11

Loading