Adversarial Robustness in One-Stage Learning-to-Defer

Yannis Montreuil; Yu Letian; Axel Carlier; Lai Xing Ng; Wei Tsang Ooi

Adversarial Robustness in One-Stage Learning-to-Defer

Yannis Montreuil, Yu Letian, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce attacks and corresponding defense mechanisms for both classification and regression task in Learning-to-Defer

Abstract: Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

Code Dataset Promise: No

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 1207

Loading