Abstract: The super-alignment problem of how humans can effectively supervise super-human AI has garnered increasing attention. Recent research has focused on investigating the weak-to-strong generalization (W2SG) scenario as an analogy for super-alignment. This scenario examines how a pre-trained strong model, supervised by an aligned weak model, can outperform its weak supervisor. Despite good progress, current W2SG methods face two main issues: 1) The annotation quality is limited by the knowledge scope of the weak model; 2) It is risky to position the strong model as the final corrector. To tackle these issues, we propose a ``Strong Empowered and Aligned Weak Mastered'' (SEAM) framework for weak annotations in W2SG. This framework can leverage the vast intrinsic knowledge of the pre-trained strong model to empower the annotation and position the aligned weak model as the annotation master. Specifically, the pre-trained strong model first generates principle fast-and-frugal trees for samples to be annotated, encapsulating rich sample-related knowledge. Then, the aligned weak model picks informative nodes based on the tree's information distribution for final annotations. Experiments on six datasets for preference tasks in W2SG scenarios validate the effectiveness of our proposed method.
Loading