A Reality Check on Robust Bandit Algorithms for Buffer-Aware Early Exits

A Reality Check on Robust Bandit Algorithms for Buffer-Aware Early Exits

ICLR 2026 Conference Submission21534 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: early-exit deep neural networks, multi-armed bandits, edge computing

TL;DR: we treat early exits in neural networks as a queue-aware scheduling problem, and solve it with robust bandit algorithms

Abstract: Early-exit neural networks (EENNs) reduce inference cost by allowing inputs to terminate at intermediate layers when classification confidence exceeds a threshold. However, practical deployments must operate under stochastic arrivals, limited device resources, and finite buffers, where the backlog directly impacts performance. This paper provides a systems-oriented study of buffer-aware EENNs and introduces new learning algorithms for threshold selection. First, we report results from real testbed experiments on heterogeneous devices, showing that incorporating buffer state into early-exit decisions substantially improves throughput and accuracy under load. Second, we extend policy gradient methods by integrating the Tsallis-softmax parameterization, which yields tunable exploration, robustness to high-variance rewards, and connects recent advances in the $q$-exponential family for policy optimization to practical scheduling in EENNs. Third, we propose contextual bandit algorithms that exploit the natural monotonic relationship between backlog and urgency via parametrized thresholds, reducing sample complexity and enabling generalization across system loads. Together, these contributions highlight that early exits are not only a model-design mechanism but also a systems scheduling problem, bridging theory and practice for robust and efficient inference in resource-constrained environments.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 21534

Loading