CTBench: A Library and Benchmark for Certified Training

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We develop a library unifying certified training algorithms, achieve SOTA universally and significantly by correcting exsiting implementation mistakes, gain new insights on many questions of interest to the area, and point out future work directions.
Abstract: Training certifiably robust neural networks is an important but challenging task. While many algorithms for (deterministic) certified training have been proposed, they are often evaluated on different training schedules, certification methods, and systematically under-tuned hyperparameters, making it difficult to compare their performance. To address this challenge, we introduce CTBench, a unified library and a high-quality benchmark for certified training that evaluates all algorithms under fair settings and systematically tuned hyperparameters. We show that (1) almost all algorithms in CTBench surpass the corresponding reported performance in literature in the magnitude of algorithmic improvements, thus establishing new state-of-the-art, and (2) the claimed advantage of recent algorithms drops significantly when we enhance the outdated baselines with a fair training schedule, a fair certification method and well-tuned hyperparameters. Based on CTBench, we provide new insights into the current state of certified training, including (1) certified models have less fragmented loss surface, (2) certified models share many mistakes, (3) certified models have more sparse activations, (4) reducing regularization cleverly is crucial for certified training especially for large radii and (5) certified training has the potential to improve out-of-distribution generalization. We are confident that CTBench will serve as a benchmark and testbed for future research in certified training.
Lay Summary: Neural networks are powerful tools used in everything from medical diagnostics to self-driving cars, but they can be easily fooled by small changes to their inputs—like altering just a few pixels in an image—raising concerns about their reliability. To make these systems more trustworthy, researchers have developed techniques that train models to be provably robust against such changes. However, because different methods were developed independently, past comparisons have often been inconsistent—especially when newer approaches were evaluated more favorably than older ones due to differences in testing conditions or tuning. In this paper, we introduce CTBench, the first unified framework that enables fair and thorough comparisons of leading robust training methods. By evaluating all methods under the same conditions and carefully tuning each one, we find that many older techniques perform much better than previously reported, narrowing the apparent advantage of newer state-of-the-art methods. Beyond comparison, CTBench also helps us better understand how robust models behave—for example, they tend to make similar mistakes and use their internal components more selectively. Finally, CTBench provides a simple, modular codebase that makes it easier to design and evaluate future robust training methods. We hope CTBench will serve as a strong foundation for future research, helping to build AI systems that are not only powerful, but also reliably safe and robust.
Link To Code: https://github.com/eth-sri/CTBench
Primary Area: Deep Learning->Robustness
Keywords: certified training, certification, library, benchmark
Submission Number: 7082
Loading