Self-Concordant Preference Learning from Noisy Labels

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Noisy annotation, preference learning
Abstract: Preference learning is an integral part of the training process for a large language model (LLM) to serve user applications. While this alignment is usually done via offline learning from annotated feedback, there is inherent noise in obtaining such data, and most current methods are sensitive to such noise. In this work, we propose a novel approach to use such noisy labels based on concordant losses. Our proposed method is based on learning the optimal model under an adversarial labeller. Experiments show that our proposal is more effective than common algorithms for various levels of noise.
Submission Number: 80
Loading