Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We show GF on two-layer networks converges to robust classifiers for data from GMM with orthonormal cluster centers
Abstract: Deep learning-based classifiers are known to be vulnerable to adversarial attacks. Existing methods for defending against such attacks require adding a defense mechanism or modifying the learning procedure (e.g., by adding adversarial examples). This paper shows that for certain data distributions one can learn a provably robust classifier using standard learning methods and without adding a defense mechanism. More specifically, this paper addresses the problem of finding a robust classifier for a binary classification problem in which the data comes from an isotropic mixture of Gaussians with orthonormal cluster centers. First, we characterize the largest $\ell_2$-attack any classifier can defend against while maintaining high accuracy, and show the existence of optimal robust classifiers achieving this maximum $\ell_2$-robustness. Next, we show that given data from the orthonormal Gaussian mixture model, gradient flow on a two-layer network with a polynomial ReLU activation and without adversarial examples provably finds an optimal robust classifier.
Lay Summary: Standard neural network training paradigm often produce networks that are susceptible to malicious attacks that try to manipulate the network outputs by injecting human imperceptible perturbations to the network inputs. We use an ideal mathematical model to explain this vulnerability of neural networks and provide insights into how we can address this issue.
Primary Area: Theory->Optimization
Keywords: Orthonormal Gaussian Mixture, Robust classifier, Two-layer Network, Gradient Flow
Submission Number: 7940
Loading