An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures

Thibaut Boissin; Franck Mamalet; Thomas Fel; Agustin Martin Picard; Thomas Massena; Mathieu Serrurier

An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures

Thibaut Boissin, Franck Mamalet, Thomas Fel, Agustin Martin Picard, Thomas Massena, Mathieu Serrurier

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A method to construct scalable orthogonal convolutions that support striding, grouping, transposition and dilation.

Abstract: Orthogonal convolutional layers are valuable components in multiple areas of machine learning, such as adversarial robustness, normalizing flows, GANs, and Lipschitz-constrained models. Their ability to preserve norms and ensure stable gradient propagation makes them valuable for a large range of problems. Despite their promise, the deployment of orthogonal convolution in large-scale applications is a significant challenge due to computational overhead and limited support for modern features like strides, dilations, group convolutions, and transposed convolutions. In this paper, we introduce **AOC** (Adaptive Orthogonal Convolution), a scalable method that extends a previous method (BCOP), effectively overcoming existing limitations in the construction of orthogonal convolutions. This advancement unlocks the construction of architectures that were previously considered impractical. We demonstrate through our experiments that our method produces expressive models that become increasingly efficient as they scale. To foster further advancement, we provide an open-source python package implementing this method, called **Orthogonium**.

Lay Summary: Orthogonal layers have become essential in machine learning because they stabilize training, enhance robustness against adversarial attacks, and improve performance in generative models like normalizing flows and GANs. Specifically, orthogonal convolutional layers help models maintain consistent gradient norms, ensuring more reliable learning and greater stability. Despite these advantages, traditional orthogonal convolution methods are computationally demanding and lack flexibility for modern features, limiting their use in large-scale, practical scenarios. We created Adaptive Orthogonal Convolution (AOC), an approach that solves these limitations, making orthogonal convolutions practical for advanced use-cases. AOC supports modern features like strides, group convolutions, and dilations— which are critical for some tasks. Our method allows to train larger and and more powerful models, without the usual cost of orthogonal constraints. We share our solution through an easy-to-use, open-source library. Allowing researchers and developers to easily build expressive, stable models, significantly expanding the potential applications of orthogonal convolutional layers in fields like generative models, robust prediction, and secure AI systems.

Link To Code: https://github.com/deel-ai/orthogonium

Primary Area: Deep Learning->Robustness

Keywords: Machine Learning, Orthogonal convolutions, certifiable adversarial robustness, Lipschitz neural networks

Submission Number: 2502

Loading