Average Top-k Aggregate Loss for Supervised Learning

Siwei Lyu, Yanbo Fan, Yiming Ying, Bao-Gang Hu

Published: 2022, Last Modified: 05 May 2023IEEE Trans. Pattern Anal. Mach. Intell. 2022Readers: Everyone

Abstract: In this work, we introduce the <i>average top-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="fan-ieq1-3005393.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></alternatives></inline-formula></i> ( <inline-formula><tex-math notation="LaTeX">$\mathrm {AT}_k$</tex-math></inline-formula> ) loss, which is the average over the <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> largest individual losses over a training data, as a new aggregate loss for supervised learning. We show that the <inline-formula><tex-math notation="LaTeX">$\mathrm {AT}_k$</tex-math></inline-formula> loss is a natural generalization of the two widely used aggregate losses, namely the average loss and the maximum loss. Yet, the <inline-formula><tex-math notation="LaTeX">$\mathrm {AT}_k$</tex-math></inline-formula> loss can better adapt to different data distributions because of the extra flexibility provided by the different choices of <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> . Furthermore, it remains a convex function over all individual losses and can be combined with different types of individual loss without significant increase in computation. We then provide interpretations of the <inline-formula><tex-math notation="LaTeX">$\mathrm {AT}_k$</tex-math></inline-formula> loss from the perspective of the modification of individual loss and robustness to training data distributions. We further study the classification calibration of the <inline-formula><tex-math notation="LaTeX">$\mathrm {AT}_k$</tex-math></inline-formula> loss and the error bounds of <inline-formula><tex-math notation="LaTeX">$\mathrm {AT}_k$</tex-math></inline-formula> -SVM model. We demonstrate the applicability of minimum average top- <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> learning for supervised learning problems including binary/multi-class classification and regression, using experiments on both synthetic and real datasets.

0 Replies