Twin Evolution with Meta Preference Optimization for Semi-Supervised Learning of Large Language Models

ICLR 2026 Conference Submission19117 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Semi-Supervised LLM Finetuning, LLM Adaptation, Self-Evolution
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, yet their adaptation to specific downstream tasks remains challenging due to limited labeled data. Although post-training methods (e.g., SFT, DPO) have proven effective, they face significant limitations due to the scarcity of labeled data. In this paper, we present TwinEvol, a framework that treats downstream task training and evaluation as complementary, co-evolving submodules. TwinEvol introduces an evaluation agent that co-evolves with the main model; this agent is not a static external module but rather self-iterates and evolves through continuous interaction with the generation LLM after iterative calibration. The agent facilitates more nuanced assessment during downstream adaptation, incorporating hard negative mining and meta-preference optimization to achieve comprehensive feedback and efficient knowledge transfer. Through an iterative twin evolution process, the framework establishes a self-reinforcing cycle that effectively propagates knowledge from labeled to unlabeled data while maintaining task alignment. Experiments across various downstream tasks demonstrate that TwinEvol achieves superior performance compared to existing methods. Our code is available at https://anonymous.4open.science/r/TwinEvol/.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 19117
Loading