Abstract: We present a computationally-efficient strategy to initialise the hyperparameters of a Gaussian process (GP) avoiding the computation of the likelihood function. Our strategy can be used as a pretraining stage to find initial conditions for maximum-likelihood (ML) training, or as a standalone method to compute hyperparameters values to be plugged in directly into the GP model. Motivated by the fact that training a GP via ML is equivalent (on average) to minimising the KL-divergence between the true and learnt model, we set to explore different metrics/divergences among GPs that are computationally inexpensive and provide hyperparameter values that are close to those found via ML. In practice, we identify the GP hyperparameters by projecting the empirical covariance or (Fourier) power spectrum onto a parametric family, thus proposing and studying various measures of discrepancy operating on the temporal and frequency domains. Our contribution extends the variogram method developed by the geostatistics literature and, accordingly, it is referred to as the generalised variogram method (GVM). In addition to the theoretical presentation of GVM, we provide experimental validation in terms of accuracy, consistency with ML and computational complexity for different kernels using synthetic and real-world data.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: **First revision (color-coded in red)** - A brief introduction to the concepts of Fourier analysis with a table of parametric pairs for covariances and PSDs in Appendix A - A definition of standard divergences in Appendix B - We considered different sets of ground truth parameters for the synthetic datasets used in E6 and E2 (E1 and E3 in the original version respectively) - Two pseudo-code algorithms for the variants (temporal and spectral) of the proposed GVM (Sec. 4) - More emphasis in that our focus is on the scalar-input case - A clarification of our contributions in the Introduction and in the beginning of the experiments section - A brief discussion of the relationship between our method and the concept of kernel alignment and an incorporation of the relevant references - A discussion of the method's limitation in the Conclusions - The revised set of experiments according to the Reviewers' recommendations (supplementary material) **Second revision (color-coded in blue)** - Updated notation in Sec 2.1 - Incorporation of references from the GP literature (Sec 2.3) - Complete re-organisation of Secs 3 and 4 - Improved explanation of cost, computation of quantile functions and optimisation methods (Sec 4) - Clearer Algoritms (Sec 4) - An additional experiment comparing the spectral and temporal implementations of the GVM method (Experiment 4) - Based on the valuable feedback of the Reviewers, a comment in the Acknowledgments thanking for their input. **Third revision** - Incorporation of additional suggestions of Reviewer Z3Qg **Last revision (camera ready)** - Include authors, affiliations and acknowledgements - Remove colour coding, all text is in black
Supplementary Material: zip
Assigned Action Editor: ~Cedric_Archambeau1
Submission Number: 483