Scaling Effects of Instruction Tuning in Encoder-Based Language Models

ACL ARR 2025 February Submission4205 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Instruction tuning has significantly improved the task-following capabilities of decoder-based language models, yet its effects on encoder-based architectures remain underexplored. This study investigates instruction tuning in the XLM-R model family for prompted classification tasks, analyzing models ranging from $250M$ to $10B$ parameters under three training paradigms: standard fine-tuned, prompted base models, and instruction-tuned prompted models. Our experiments, conducted on a subset of SuperGLUE classification datasets, show that instruction tuning significantly benefits larger XLM-R variants, particularly those with at least 500M parameters. However, the performance gains do not scale directly with model size. Notably, XLM-R\textsubscript{large} achieves competitive improvements, while XLM-R\textsubscript{XL} underperforms despite its substantially larger parameter count. These findings suggest that pre-training data quality and quantity may play a key role in how well encoder-based models leverage instruction tuning. Additionally, we observe that the alignment between instruction tuning data and downstream tasks influences performance, underscoring the importance of data diversity. Our findings contribute to a more nuanced understanding of instruction tuning in encoder models and offer insights into optimizing their task-following capabilities.

Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: prompting,scaling,fine-tuning
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 4205
Loading