Abstract: Bayesian Optimization (BO) has shown great
promise for the global optimization of functions
that are expensive to evaluate, but despite many
successes, standard approaches can struggle in
high dimensions. To improve the performance of
BO, prior work suggested incorporating gradient
information into a Gaussian process surrogate of
the objective, giving rise to kernel matrices of
size nd × nd for n observations in d dimensions.
Naïvely multiplying with (resp. inverting) these
matrices requires O(n^2d^2) (resp. O(n^3d^
3)) operations, which becomes infeasible for moderate
dimensions and sample sizes. Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact O(n^2d) matrix-vector multiply for gradient observations and O(n^2d^2) for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel
classes, which constitutes a structure-aware automatic differentiation algorithm. Our methods
apply to virtually all canonical kernels and automatically extend to complex kernels, like the
neural network, radial basis function network, and
spectral mixture kernels without any additional
derivations, enabling flexible, problem-dependent
modeling while scaling first-order BO to high d.
0 Replies
Loading