Enhancing LM's Task Adaptability: Powerful Post-training Framework with Reinforcement Learning from Model Feedback

Fuju Rong, Weihao Gao, Zhuo Deng, Zheng Gong, Chucheng Chen, Wenze Zhang, Zhiyuan Niu, Fang Li, Lan Ma

Published: 01 Jan 2024, Last Modified: 12 Jun 2025ICANN (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Pretrained language models have become the dominant approach in natural language processing (NLP). However, the notable bias between the pretraining and fine-tuning stages remains unresolved when applied to downstream tasks. To address this issue, we propose RLPT, a task-adaptive reinforcement learning based post-training framework. Established on the observation that essential and task-irrelevant tokens coexist in the input of NLP tasks, we first schedule a random token pruning strategy, fine-tuning BERT to a game-like environment model. Subsequently, we utilize the Actor-Critic architecture to build the post-trained model, which can learn to discern the essential tokens for specific tasks by observing the feedback from the environment model under different token pruning actions. Experimental results demonstrate significant performance improvements on the benchmark datasets in NLP. Specifically, on the GLUE benchmark, our RLPT approach improved the BERT base model by 2.35 points, verifying its effectiveness in reducing bias and enhancing model task adaptability.