Abstract: This paper addresses non-Gaussian regression with neural networks via the use of the Tukey $g$-and-$h$ distribution. The Tukey $g$-and-$h$ transform is a flexible parametric transform with two parameters $g$ and $h$ which, when applied to a standard normal random
variable, introduces both skewness and kurtosis, resulting in a distribution commonly called the Tukey $g$-and-$h$ distribution. Specific values of $g$ and $h$ produce good approximations to other families of distributions, such as the Cauchy and student's $t$ distributions. The flexibility of the Tukey $g$-and-$h$ distribution has driven its popularity in the statistical community, in geosciences and finance. In this work we consider the training of a neural network to predict the parameters of a Tukey $g$-and-$h$ distribution in a regression framework via the minimization of the corresponding negative log-likelihood, despite the latter having no closed-form expression. We demonstrate the efficiency of our procedure in simulated settings and present an application to a real-world dataset of global crop yield for several types of crops. Finally, we assess our probabilistic predictions via the logarithmic score and Probability Integral Transform. A PyTorch implementation is made available on GitHub and as a PyPI package.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank the reviewers for their comments. Overall, the reviewers seemed to agree on the fact that the simulation and real-data experiments both lacked quantitative metrics to evaluate the method. They also noted that the selection of the architecture of the neural network was rather arbitrary. We have aimed to address those key points with the following key changes, since last submission.
1. Hyperparameter selection using Optuna framework
Both for the simulation study and the real-data application, we now select the neural network architecture, and other hyperparameters, using the Optuna framework (reference provided in the manuscript).
2. Multiple independent datasets
For the simulation study, we now use 10 multiple independent datasets instead of a single dataset. We report metrics aggregated across those datasets.
3. Goodness-of-fit section revisited - additional metrics
We have revisited (and renamed to "Quantitative and qualitative metrics for probabilistic prediction") this section, both due to a lack of clarity noted by the reviewers, and due to the necessity of introducing quantitative metrics to be reported in the experiments.
We refer to the literature on probabilistic prediction to justify the use of the logarithmic score as the reported metric, following comments that noted the lack of quantitative metrics reported from the experiments. We also now use Probability Integral Transform residuals, again following existing litterature, to assess qualitatively the predictive distributions.
4. Study of computational cost
The reviewers noted that there was a lack of information concerning the computational cost of the method. One reviewer mentionned replacing the binary search with Newton's method. We implemented a first version of Newton's method, however it seems to not converge in some cases. We suppose this might be due to the derivative of the Tukey g-and-h transform being very small in some regions. Yet, a computational study with the binary search shows that the cost of the binary search becomes negligible compared to that of backpropagation for large neural networks. This study is included in the appendix.
We will additionally provide responses to each reviewer separately.
Assigned Action Editor: ~Mauricio_A_Álvarez1
Submission Number: 5440
Loading