ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction

ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction

ACL ARR 2025 May Submission4904 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent efforts in LLM alignment have focused on constructing large-scale preference datasets via human or Artificial Intelligence(AI) annotators. However, such approaches rely on *instance-wise* supervision, incurring substantial annotation cost and limited interpretability. In this paper, we propose **ZEBRA**—a *model behavior-wise zero-annotation* framework that constructs preference data by leveraging model behavior knowledge derived from benchmark performances. ZEBRA binarizes response pairs by evaluating the quality and similarity of their origin models, entirely bypassing instance-level annotation. This allows scalable, controllable, and cost-effective alignment data generation. Empirical results show that ZEBRA achieves alignment performance comparable to instance-supervised methods, despite requiring no manual or model-based labeling.

Paper Type: Long

Research Area: Generation

Research Area Keywords: data influence,Hardness of Samples

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 4904

Loading