Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

ICLR 2026 Conference Submission20840 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Certified Robustness, Provable Robustness, Certificates, Neural Tangent Kernel, Partition Aggregation, Label-flipping, Label Poisoning, Kernel SVM, Kernel Regression, Integer Program, Multiple Choice Knapsack Problem

TL;DR: We develop the first white-box informed robustness certificate for partition-based ensembles and the first polynomial-time exact certificate for sufficiently wide neural networks against label-flipping poisoning attacks.

Abstract: Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition aggregation, but treat the corresponding base classifiers as black boxes—yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the Neural Tangent Kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black-box certificate. Exemplary, on CIFAR-10, our method can certify upto $\mathbf{+26.5\\%}$ more label flips in median over the test set compared to the existing black-box approach while requiring $\mathbf{100 \times}$ fewer partitions, thus challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 20840

Loading