Exploiting Benford's Law for Weight Regularization of Deep Neural Networks

Published: 21 Feb 2025, Last Modified: 21 Feb 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Stochastic learning of Deep Neural Network (DNN) parameters is highly sensitive to training strategy, hyperparameters, and available training data. Many state-of-the-art solutions use weight regularization to adjust parameter distributions, prevent overfitting, and support generalization of DNNs. None of the existing regularization techniques have ever exploited a typical distribution of numerical datasets with respect to the first non-zero (or significant) digit, called Benford's Law (BL). In this paper, we show that the deviation of the significant digit distribution of the DNN weights from BL is closely related to the generalization of the DNN. In particular, when the DNN is presented with limited training data. To take advantage of this finding, we use BL to target the weight regularization of DNNs. Extensive experiments are performed on image, table, and speech data, considering convolutional (CNN) and Transformer-based neural network architectures with varying numbers of parameters. We show that the performance of DNNs is improved by minimizing the distance between the significant digit distributions of the DNN weights and the BL distribution along with L2 regularization. The improvements depend on the network architecture and how it deals with limited data. However, the proposed penalty term improves consistently and some CNN-based architectures gain up to $15\%$ test accuracy over the default training scheme with L2 regularization on subsets of CIFAR 100.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission:
  • revision of citation style
Assigned Action Editor: Simone Scardapane
Submission Number: 3778
Loading