TL;DR: We provide optimization guarantees for two popular neural operators—Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs)—under a common framework.
Abstract: Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.
Lay Summary: Neural operators are widely used for numerical applications in scientific computing—two of the most widely used are Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs). Although these models have been extensively trained, there has been no formal understanding of why their training works. To fill this gap, we first propose a general mathematical framework that provides conditions under which we can explain the training of any model. Then, we instantiate these conditions to the case of DONs and FNOs—a non-trivial task in itself—to guarantee the training success of these technologies. Importantly, we find that the wider the neural operators, the larger the benefit on our training conditions. Experimentally, we find that larger widths lead to faster convergence and lower training error. These results are important not only for explaining neural operator training, but also to potentially understand the training of other deep learning technologies.
Primary Area: Deep Learning->Theory
Keywords: neural operators, optimization, training
Submission Number: 7866
Loading