Fast variable selection makes scalable Gaussian process BSS-ANOVA a speedy and accurate choice for tabular and time series regression
Keywords: scalable Gaussian process, timeseries, tabular data
TL;DR: Fast variable selection makes scalable Gaussian process BSS-ANOVA a speedy and accurate choice for tabular and time series regression
Abstract: Many approaches for scalable GPs have focused on using a subset of data as inducing points. Another promising approach is the Karhunen-Loève (KL) decomposition, in which the GP kernel is represented by a set of basis functions which are the eigenfunctions of the kernel operator. Such kernels have the potential to be very fast, and do not depend on the selection of a reduced set of inducing points. However KL decompositions lead to high dimensionality, and variable selection thus becomes paramount. This paper reports a new method of forward variable selection, enabled by the ordered nature of the basis functions in the KL expansion of the Bayesian Smoothing Spline ANOVA kernel (BSS-ANOVA), coupled with fast Gibbs sampling in a fully Bayesian approach. It quickly and effectively limits the number of terms, yielding a method with competitive accuracies, training and inference times for tabular datasets of low feature set dimensionality. The new algorithm determines how high the orders of included terms should reach, balancing model fidelity with model complexity using $L^0$ penalties inherent in Bayesian and Akaike information criteria. The inference speed and accuracy makes the method especially useful for modeling dynamic systems, by modeling the derivative in a dynamic system as a static problem, then integrating the learned dynamics using a high-order scheme. The methods are demonstrated on two dynamic datasets: a 'Susceptible, Infected, Recovered' (SIR) toy problem, with the transmissibility used as forcing function, along with the experimental 'Cascaded Tanks' benchmark dataset. Comparisons on the static prediction of derivatives are made with a random forest (RF), a residual neural network (ResNet), and the Orthogonal Additive Kernel (OAK) inducing points scalable GP, while for the timeseries prediction comparisons are made with LSTM and GRU recurrent neural networks (RNNs). The GP outperforms the RF and ResNet on the static estimation, and is comparable to OAK. In dynamic systems modeling it outperforms both RNNs, while performing many orders of magnitude fewer calculations. For the SIR test, which involved prediction for a set of forcing functions qualitatively different from those appearing in the training set, BSS-ANOVA captured the correct dynamics while the neural networks failed to do so.
Supplementary Material: zip
10 Replies
Loading