Reward Calibration Beyond the Convex Hull: Depth-Based Feasibility and Regularized Exponential Tilting for Generative Models
Keywords: reward calibration, generative model calibration, distributional constraints, KL projection, information projection, maximum entropy, exponential tilting, convex hull feasibility, halfspace (Tukey) depth, Wendel phase transition, ridge regularization, finite-sample generalization bounds
TL;DR: Reward calibration can fail when targets lie outside the convex hull of sampled statistics; depth/phase-transition theory explains when, and ridge-regularized exponential tilting is always feasible with a residual certificate.
Abstract: We study constraint calibration of a base generative distribution $P_0$ via KL-projection onto expectation constraints.
Recent work proposes a reward-style surrogate that approximates the maximum-entropy (exponential-tilting) solution by replacing expectations under $P_0$ with Monte Carlo averages.
However, the resulting empirical maximum-entropy problem is only well-defined when the target moment vector lies in the interior of the convex hull of sampled statistics, an event that can fail with high probability in high-dimensional or rare-event regimes.
We quantify this phenomenon by reducing reward feasibility to convex-hull membership probabilities and leveraging sharp depth-based inequalities together with Wendel-type phase transitions.
Motivated by these limits, we propose a ridge-regularized exponential-tilting estimator that is always defined and satisfies an exact residual identity controlling constraint mismatch.
We prove finite-sample bounds on parameter error, moment violation, and KL deviation, and validate the predicted feasibility transitions and bias--variance tradeoffs in synthetic experiments.
Submission Number: 7
Loading