Efficient Evaluation of LLMs via Branching Preference Learning

15 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs; dialogue evaluation;RLHF
Abstract: Large language models (LLMs) have made significant advances across various generative tasks, progressing toward achieving near-human levels of intelligence. However, in many scenarios, LLMs face the challenge of insufficient human evaluation or even the inability to evaluate reliably. Particularly, in complex dialogue scenarios involving diverse and intricate user intents, LLMs as evaluators of AI responses exhibit a substantial gap compared to humans. Moreover, due to the scarcity of high-quality evaluation data, LLMs exhibit deficiencies in their evaluation capabilities. In this work, we conceptualize the evaluation process as a decision tree, where each node represents an evaluation action, and each path from the root to a leaf node represents a trajectory of evaluation reasoning. We demonstrate that within a limited search space, there exist better decision-making behaviors that facilitate the model in making reasonable and accurate judgments. Specifically, we propose a tree-based data sampling method to generate supervised data and preference pairs derived from the evaluation tree. Furthermore, we introduce preference learning based on the DPO algorithm, which empowers the fine-grained evaluation model to explore and learn better branching strategies within budget-limited scenarios. Our model significantly reduces the dependency on labeled data and demonstrates strong performance across three different evaluation settings: in-distribution, out-of-distribution, and transfer evaluation. Experiments indicate that our model can reduce inference costs by 90\% compared to conducting searches across the entire evaluation tree, thereby significantly enhancing efficiency.
Supplementary Material: zip
Primary Area: Natural language processing
Submission Number: 14917
Loading