everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Instruction tuning has significantly improved the task-following capabilities of decoder-based language models, yet its effects on encoder-based architectures remain underexplored. This study investigates instruction tuning in the XLM-R model family for prompted classification tasks, analyzing models ranging from $250M$ to $10B$ parameters under three training paradigms: standard fine-tuned, prompted base models, and instruction-tuned prompted models. Our experiments, conducted on a subset of SuperGLUE classification datasets, show that instruction tuning significantly benefits larger XLM-R variants, particularly those with at least 500M parameters. However, the performance gains do not scale directly with model size. Notably, XLM-R\textsubscript{large} achieves competitive improvements, while XLM-R\textsubscript{XL} underperforms despite its substantially larger parameter count. These findings suggest that pre-training data quality and quantity may play a key role in how well encoder-based models leverage instruction tuning. Additionally, we observe that the alignment between instruction tuning data and downstream tasks influences performance, underscoring the importance of data diversity. Our findings contribute to a more nuanced understanding of instruction tuning in encoder models and offer insights into optimizing their task-following capabilities.