Exploring the Intricacies of Neural Network Optimization

Rafael Teixeira, Mário Antunes, Rúben Sobral, João Martins, Diogo Gomes, Rui L. Aguiar

Published: 01 Jan 2023, Last Modified: 13 Aug 2024DS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent machine learning breakthroughs in computer vision and natural language processing were possible due to Deep Neural Networks (DNNs) learning capabilities. Even so, applying DNNs is quite challenging, as they usually have more hyperparameters than shallow models. The higher number of hyperparameters leads to allocating more time for model optimization and training to achieve optimal results. However, if there is a better understanding of the impact of each hyperparameter on the model performance, then one can decide which hyperparameters to optimize according to the available optimization budget or desired performance. This work analyzes the impact of the different hyperparameters when applying dense DNNs to tabular datasets. This is achieved by optimizing each hyperparameter individually and comparing their influence on the model performance. The results show that the batch size usually only affects training time, reducing it by up to 80% or increasing it by 200%. In contrast, the hidden layer size does not consistently affect the considered performance metrics. The optimizer can significantly affect the model’s overall performance while also varying the training time, with Adam being the generally the better optimizer. Overall, we show that the hyperparameters do not equally affect the DNN and that some can be discarded if there is a constrained search budget.