Abstract: As an emerging machine learning task, high-dimensional hyperparameter optimization (HO) aims at enhancing traditional deep learning models by simultaneously optimizing the neural networks’ weights and hyperparameters in a joint bilevel configuration. However, such nested objectives can impose nontrivial difficulties for the pursuit of the gradient of the validation risk with respect to the hyperparameters (a.k.a. hypergradient). To tackle this challenge, we revisit its bilevel objective from the novel perspective of continuous dynamics and then solve the whole HO problem with the adjoint state theory. The proposed HO framework, termed Adjoint Diff, is naturally scalable to a very deep neural network with high-dimensional hyperparameters because it only requires constant memory cost in training. Adjoint Diff is in fact, a general framework that some existing gradient-based HO algorithms are well interpreted by it with simple algebra. In addition, we further offer the Adjoint Diff+ framework by incorporating the prevalent momentum learning concept into the basic Adjoint Diff for enhanced convergence. Experimental results show that our Adjoint Diff frameworks outperform several state-of-the-art approaches on three high-dimensional HO instances including, designing a loss function for imbalanced data, selecting samples from noisy labels, and learning auxiliary tasks for fine-grained classification.
External IDs:dblp:journals/tai/DouLDFGDY25
Loading