Generalization and Robustness of the Tilted Empirical Risk

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC0 1.0
TL;DR: We provide generalization error upper bound on the tilted empirical risk with convergence rate $O(n^{-\epsilon/(1+\epsilon)})$ for unbounded loss functions under bounded $(1+\epsilon)$-th moment of loss function for some $\epsilon\in(0,1]$.
Abstract: The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, Li et al. (2021) proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+\epsilon)$-th moment of loss function for some $\epsilon\in(0,1]$ with a convergence rate of $O(n^{-\epsilon/(1+\epsilon)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner.
Lay Summary: When we train a machine-learning model we want to know how well it will perform on new, unseen data—this is called its generalization error. Researchers estimate that error by averaging how wrong the model is on the training set. Recently, a technique called the tilted empirical risk (TER) was proposed. Instead of giving every training example equal weight, TER lets you tilt the calculation so that unusually large errors are emphasized less (negative tilt). In this paper, we answer the following question: *What happens if we use negative tilt with “outliers” ?* We show two main results: 1- **Reliable performance guarantees without data shift:** We prove mathematical limits on how far TER can be different from the unknown error even when individual errors (losses) can become very large. As the training set grows, that gap shrinks at a predictable pace, meaning TER remains a reliable method. 2- **Robustness when the data are noisy:** We analyze scenarios where the training data are contaminated with noisy outliers or come from a slightly different distribution than the data the model will face later (a “distribution shift”). The theory shows that, under negative tilt, TER still gives dependable guidance. Finally, we run simple experiments to confirm the theory and to demonstrate a practical way to pick the best amount of tilt directly from the data. In short, the study explains why—and by how much—down-weighting extreme training errors (using negative tilt) can make risk estimates both stable and resilient to bad (noisy) data, giving practitioners a solid, data-driven method to tune when their datasets are noisy.
Primary Area: Theory->Learning Theory
Keywords: Tilted Empirical Risk, Generalization error, uniform bound, information theoretical bounds. unbounded loss function, heavy-tailed loss function
Submission Number: 6557
Loading