Keywords: Social Learning, Natural Language Feedback, Instructive Learning
TL;DR: We explore whether we can finetune language models by letting them generate answers conditioned on prior feedback.
Abstract: In this paper we explore the simple idea of teaching models by allowing them to condition their answers on natural language feedback. Motivated by the idea that natural language interactions provide a targeted, flexible, and level-appropriate reward signal, we study the ability of small instruction-tuned models to leverage feedback from a larger frontier model. We find while the frontier model provides generally high quality feedback, especially smaller models can struggle to use this due to noise in their generative output. After incorporating techniques like negative sampling, we find that models trained on these feedback-conditioned responses can perform similarly to those trained directly on teacher responses. We explore training using supervised finetuning and preference learning algorithms over a broad set of tasks including Big-Bench Hard. These findings are broadly applicable and our methods rely only on the ability of models to give and receive linguistic feedback. As such, they contribute to a growing body of work exploring how to best utilise the linguistic capabilities of language models for human-like instructive learning.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Resubmission: No
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14230
Loading