Trading off Consistency and Dimensionality of Convex Surrogates for Multiclass Classification

Enrique Nueve; Dhamma Kimpara; Bo Waggoner; Jessica Finocchiaro

Trading off Consistency and Dimensionality of Convex Surrogates for Multiclass Classification

Enrique Nueve, Dhamma Kimpara, Bo Waggoner, Jessica Finocchiaro

Published: 25 Sept 2024, Last Modified: 06 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: Loss Functions, Consistency, Property Elicitation

TL;DR: This work explores theoretical guarantees for partial consistency when violating known prediction dimension bounds.

Abstract: In multiclass classification over $n$ outcomes, we typically optimize some surrogate loss $L: \mathbb{R}^d \times\mathcal{Y} \to \mathbb{R}$ assigning real-valued error to predictions in $\mathbb{R}^d$. In this paradigm, outcomes must be embedded into the reals with dimension $d \approx n$ in order to design a consistent surrogate loss. Consistent losses are well-motivated theoretically, yet for large $n$, such as in information retrieval and structured prediction tasks, their optimization may be computationally infeasible. In practice, outcomes are typically embedded into some $\mathbb{R}^d$ for $d \ll n$, with little known about their suitability for multiclass classification. We investigate two approaches for trading off consistency and dimensionality in multiclass classification while using a convex surrogate loss. We first formalize partial consistency when the optimized surrogate has dimension $d \ll n$. We then check if partial consistency holds under a given embedding and low-noise assumption, providing insight into when to use a particular embedding into $\mathbb{R}^d$. Finally, we present a new method to construct (fully) consistent losses with $d \ll n$ out of multiple problem instances. Our practical approach leverages parallelism to sidestep lower bounds on $d$.

Supplementary Material: zip

Primary Area: Learning theory

Submission Number: 5762

Loading