Enforcing Idempotency in Neural Networks

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: An alternative optimisation method to gradient-based approaches for finding idempotent neural networks is proposed and evaluated.
Abstract: In this work, we propose a new architecture-agnostic method for training idempotent neural networks. An idempotent operator satisfies $f(x) = f(f(x))$, meaning it can be applied iteratively with no effect beyond the first application. Some neural networks used in data transformation tasks, such as image generation and augmentation, can represent non-linear idempotent projections. Using methods from perturbation theory we derive the recurrence relation ${\mathbf{K}' \leftarrow 3\mathbf{K}^2 - 2\mathbf{K}^3}$ for iteratively projecting a real-valued matrix $\mathbf{K}$ onto the manifold of idempotent matrices. Our analysis shows that for linear, single-layer MLP networks this projection 1) has idempotent fixed points, and 2) is attracting only around idempotent points. We give an extension to non-linear networks by considering our approach as a substitution of the gradient for the canonical loss function, achieving an architecture-agnostic training scheme. We provide experimental results for MLP- and CNN-based architectures with significant improvement in idempotent error over the canonical gradient-based approach. Finally, we demonstrate practical applications of the method as we train a generative network successfully using only a simple reconstruction loss paired with our method.
Lay Summary: Applying an idempotent operation multiple times has the same effect as applying it once. Idempotency is a feature of many data transformation tasks we commonly tackle with machine learning, and recently it has been shown to also promote generative behaviour in neural networks. Gradient descent-based approaches to optimising for idempotency, however, are in many cases inefficient. We propose an alternative way to optimise for idempotency, using ideas from perturbation theory to derive a training scheme that is significantly more effective and without computational overhead. Our work suggests that alternative methods to gradient-based optimisation in neural networks are practically viable, opening the door to new approaches in neural network training generally.
Primary Area: Optimization
Keywords: idempotent neural networks, non-convex optimisation, gradient-free optimisation
Submission Number: 13202
Loading