The Geometry of Mixability

Published: 11 Sept 2023, Last Modified: 11 Sept 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Mixable loss functions are of fundamental importance in the context of prediction with expert advice in the online setting since they characterize fast learning rates. By re-interpreting properness from the point of view of differential geometry, we provide a simple geometric characterization of mixability for the binary and multi-class cases: a proper loss function $\ell$ is $\eta$-mixable if and only if the superprediction set $\textrm{spr}(\eta \ell)$ of the scaled loss function $\eta \ell$ slides freely inside the superprediction set $\textrm{spr}(\ell_{\log})$ of the log loss $\ell_{\log}$, under fairly general assumptions on the differentiability of $\ell$. Our approach provides a way to treat some concepts concerning loss functions (like properness) in a ''coordinate-free'' manner and reconciles previous results obtained for mixable loss functions for the binary and the multi-class cases.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Corrected typos. Incorporation of some additional content/comments suggested by the reviewers.
Assigned Action Editor: ~Daniel_M_Roy1
Submission Number: 893