URIAL: Tuning-Free Instruction Learning and Alignment for Untuned LLMs

Bill Yuchen Lin; Abhilasha Ravichander; Ximing Lu; Nouha Dziri; Melanie Sclar; Khyathi Chandu; Chandra Bhagavatula; Yejin Choi

URIAL: Tuning-Free Instruction Learning and Alignment for Untuned LLMs

Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi

Published: 28 Oct 2023, Last Modified: 26 Nov 2023Instruction Workshop @ NeurIPS 2023EveryoneRevisionsBibTeX

Keywords: LLM, alignment, in-context learning, instruction tuning

TL;DR: We first analyze what SFT+RLHF exactly teach LLMs. We then align untuned LLMs surprisingly well by in-context learning with as few as three examples that are well-written.

Abstract: Large language models (LLMs) have shown significant improvements due to alignment tuning, that is, supervised fine-tuning (SFT) on instruction data and reinforcement learning from human feedback (RLHF). This raises questions about what is precisely learned during the alignment tuning process. We investigate the effects of alignment tuning through the lens of token distribution shift between untuned LLMs and their aligned counterparts (e.g., Llama-2 versus Llama-2-Chat). Our findings reveal that most distribution changes lie in stylistic tokens (e.g., transitional words, discourse markers), suggesting that LLMs primarily learn the language style of AI assistants during alignment tuning, while most of useful knowledge has been acquired by untuned LLMs. Thus, we pose the question: Is it necessary to update model weights to attain LLM alignment? Based on these insights, we propose an alternative tuning-free method for instruction learning and alignment for untuned LLMs, URIAL, which achieves effective alignment solely through in-context learning (ICL) with as few as three curated, stylistic examples and a system prompt. We also introduce a dataset named just-eval-instruct, which consists of 1,000 examples collected from 9 existing instruction datasets such as those used by AlpacaEval. Our multi-aspect evaluation demonstrates that \textsc{Urial} can achieve highly satisfactory performance, sometimes equaling or surpassing SFT+RLHF counterparts, especially when the untuned LLM is sufficiently pre-trained. This implies that fine-tuning may not be as always crucial as previously assumed for LLM alignment, and lightweight alignment methods like \textsc{Urial} hold promise for efficiently tailoring LLM behavior without fine-tuning.

Submission Number: 95

Loading