Calibration Is Grouping: VR-SAG with Intra-Group Variance Control and Logit-Cluster Evaluation

Calibration Is Grouping: VR-SAG with Intra-Group Variance Control and Logit-Cluster Evaluation

ICLR 2026 Conference Submission12969 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: recommender systems, calibration

Abstract: Accurate click-through and conversion-rate estimates are pivotal for bid optimization in large-scale advertising, yet modern deep CTR/CVR models are often miscalibrated. Classical global calibrators (Platt scaling, isotonic regression) and feature-based binning struggle to capture latent user–item heterogeneity. We approach calibration through the lens of \emph{learned semantic groupings} and propose \textbf{Variance-Reduced Semantic-Aware Grouping (VR-SAG)}—a lightweight post-hoc layer over a frozen backbone that (i) forms semantically coherent partitions in embedding space, (ii) fits per-group temperature+bias calibrators, and (iii) explicitly penalizes intra-group variance to tighten probability spreads. Our design is grounded in a group-wise decomposition of proper scoring rules (e.g., Brier), which isolates intra-group variance as a key driver of residual miscalibration and motivates variance control for genuine loss reduction. To decouple evaluation from training, we introduce \textbf{Logit-Cluster Calibration Error (LCCE)}, an unsupervised fixed-partition metric obtained via $K$-means in logit space; LCCE aligns with the reliability term of proper scores while avoiding pitfalls of trainable grouping heads used as metrics. Across large-scale offline logs and \textbf{AuctionSys}—a realistic ad-auction simulator with oracle CTR—VR-SAG consistently improves calibration (ECE/LCCE and Brier variants) over strong baselines, with negligible latency and memory overhead. Together, VR-SAG and LCCE provide a principled, production-friendly toolkit for group-aware calibration in recommender systems.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 12969

Loading