Keywords: Multi-Grade Deep Learning, theory, practice
TL;DR: This paper explores the reason why multi-grade deep learning outperforms single-grade deep learning.
Abstract: Multi-grade deep learning (MGDL) has recently emerged as an alternative to standard end-to-end training, referred to here as single-grade deep learning (SGDL), showing strong empirical promise. This work provides both theoretical and experimental evidence of MGDL’s computational advantages. We establish convergence guarantees for gradient descent (GD) applied to MGDL, demonstrating greater robustness to learning-rate choices compared to SGDL. In the case of ReLU activations with single-layer grades, we further show that MGDL reduces to a sequence of convex optimization subproblems. For more general settings, we analyze the eigenvalue distributions of Jacobian matrices from GD iterations, revealing structural properties underlying MGDL’s enhanced stability. Practically, we benchmark MGDL against SGDL on image regression, denoising, and deblurring tasks, as well as on CIFAR-10 and CIFAR-100, covering fully connected networks, CNNs, and transformers. These results establish MGDL as a scalable framework that unites rigorous theoretical guarantees with broad empirical improvements.
Supplementary Material: pdf
Primary Area: interpretability and explainable AI
Submission Number: 14653
Loading