TL;DR: We introduce EnsLoss, a novel ensemble method combining loss functions within the empirical risk minimization framework, preserving consistency and calibration of combined losses.
Abstract: Empirical risk minimization (ERM) with a computationally feasible surrogate loss is a widely accepted approach for classification. Notably, the convexity and calibration (CC) properties of a loss function ensure consistency of ERM in maximizing accuracy, thereby offering a wide range of options for surrogate losses. In this article, we propose a novel ensemble method, namely *EnsLoss*, which extends the ensemble learning concept to combine loss functions within the ERM framework. A key feature of our method is the consideration on preserving the "legitimacy" of the combined losses, i.e., ensuring the CC properties. Specifically, we first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions and directly generating calibrated loss-derivatives. Therefore, inspired by Dropout, *EnsLoss* enables loss ensembles through one training process with doubly stochastic gradient descent (i.e., random batch samples and random calibrated loss-derivatives). We theoretically establish the statistical consistency of our approach and provide insights into its benefits. The numerical effectiveness of *EnsLoss* compared to fixed loss methods is demonstrated through experiments on a broad range of 45 pairs of CIFAR10 datasets, the PCam image dataset, and 14 OpenML tabular datasets and with various deep learning architectures. Python repository and source code are available on our Github (https://github.com/statmlben/ensLoss).
Lay Summary: When teaching computers to classify data—like identifying animals in photos—researchers use mathematical tools called "loss functions" to measure and guide how well the computer is learning. These functions act like scorecards, penalizing the model when it makes mistakes. Traditionally, researchers select just one loss function and use it throughout the training process.
Our research introduces a new approach called *EnsLoss* that combines multiple loss functions during training, similar to how consulting several experts often leads to better decisions than relying on just one. The challenge was ensuring these combinations maintain the mathematical properties needed for reliable learning. We solved this by transforming the problem to work with the derivatives of loss functions. This allows us to randomly switch between different valid loss functions during training.
Our mathematical analysis proves this approach is theoretically sound, and our experiments on 45 image classification tasks, medical images, and 14 tabular datasets show that *EnsLoss* consistently outperforms traditional single-loss methods across various neural network architectures.
We've made our code freely available on GitHub for other researchers to use and build upon.
Link To Code: https://github.com/statmlben/ensLoss
Primary Area: General Machine Learning->Supervised Learning
Keywords: classification-calibration, ensemble learning, statistical consistency, surrogate loss, stochastic regularization
Submission Number: 1217
Loading