Correcting Optimizer Selection Bias via Large Deviation Hazards

Published: 25 May 2026, Last Modified: 25 May 2026CTB@ICML 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generalization, Large deviation theory, PAC-Bayes-Chernoff, Overfitting, Optimiser Selection Bias
Abstract: Empirical risk minimisation systematically exploits finite-sample fluctuations of the training loss, producing the optimiser selection bias, responsible for miscalibration and generalisation failure in the interpolation regime. We introduce SGDR, a drop-in modification to SGD that corrects this by gating mini-batches through a two-sided rejection rule derived from the hazard transform, with the population hazards estimated via rate functions from large deviation theory. Across nine architectures spanning image and graph classification, SGDR matches or improves on baseline task performance while sharply reducing expected calibration error and overfitting, at a fraction of training time and gradient updates required by standard SGD.
Paper Type: Short (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 104
Loading