Like Father, Like Son: Kinship-Aware Preference Mapping (KARMA) for Automatic Alignment in Large Language Models

Published: 01 Jan 2025, Last Modified: 30 Jul 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent efforts in LLM alignment have focused on constructing large-scale preference datasets via human or Artificial Intelligence (AI) annotators. However, such approaches rely on instance-wise supervision, incurring substantial annotation cost and limited interpretability. In this paper, we propose ZEBRA - a model behavior-wise zero-annotation framework that constructs preference data by leveraging model behavior knowledge derived from benchmark performances. ZEBRA binarizes response pairs by evaluating the quality and similarity of their origin models, entirely bypassing instance-level annotation. This allows scalable, controllable, and cost-effective alignment data generation. Empirical results show that ZEBRA achieves alignment performance comparable to instance-supervised methods, despite requiring no manual or model-based labeling.
Loading