Opal: An Operator-Algebra View of RLHF Objectives

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: RLHF, canonicalization, objective equivalence, property testing, reproducibility
TL;DR: Opal provides decidable equivalence for RLHF objectives via canonicalization, certificates, and finite witnesses.
Abstract: We present Opal, an operator-algebra view of RLHF objectives as ladders acting on pairwise margins. For a broad reducible subclass, we prove a terminating and confluent rewrite system with a unique normal form and an $O(m)$ canonicalization algorithm. On the learning side, we establish calibration and regret transfer, and give an oracle reduction that collapses all reducible ladders to a single canonical learner. We also show gap-preserving separations for violations (score-dependent weights, gating, pair-dependent references) with an $\Omega(1/\gamma^{2})$ testing lower bound. Finally, we provide a one-pass tester that outputs either a canonical hash and certificate or a finite witness, yielding a minimal GKPO semantics for decidable equivalence and proof-carrying objectives.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 23615
Loading