Sanitizing LLMs: Retrospective Learning for Self-Correction of Inconsistent Samples via User Preferences
Keywords: Large language Model, Prompt Based Task, New Downstream Task with Unsupervised Data, Unsupervised Data annotation with User Preference
Abstract: With the advent of large language models (LLMs), using LLMs in conjunction with prompt-based tasks has demonstrated the ability to reduce the high cost and inefficiency of human annotations. Nonetheless, in unsupervised new downstream tasks that require user preferences to align data annotations with expectations, existing evaluation methods for prompt-based tasks become ineffective, especially when ground truth annotations are insufficient or missing. To fill this gap, we propose the novel Consistent and Inconsistent (CAI) Ratio, inspired by our experimental observation that LLMs underperform when the number of inconsistent samples—those with inconsistent predictions across LLMs and the student model—exceeds the number of consistent samples. By estimating the CAI ratio and identifying consistent and inconsistent samples with our proposed CAI identification approach, we aim to minimize inconsistency and enhance the accuracy of LLM-generated annotations for unsupervised data. To achieve this, we introduce Retrospective Learning (RetroL) with user preference, a data-centric approach that collaborates with the student model and LLMs, using a small number of human annotations as user preferences to resolve inconsistencies in the identified samples. Applied to eight domain-specific NLP datasets, our Retrospective Learning approach, leveraging CAI identification, significantly improved the accuracy of LLM-generated responses, with the CAI ratio increasing as the accuracy improved.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7463
Loading