Pick-to-Learn and Self-Certified Gaussian Process Approximations
Abstract: Generalisation bounds are crucial for providing data-driven models with performance and safety guarantees. In this respect, bounds that do not require a held-out test set are particularly valuable as they allow the use of all data for training. While many such bounds do not improve upon the train-test approach, which remains the gold standard, the P2L algorithm (Paccagnan et al., 2023) has shown great potential. However, P2L comes with limitations, including computational overhead, reliance on consistent data, and restriction to non-Bayesian settings. In this work, we overcome these challenges in general settings and employ the corresponding results to show that classical Gaussian process (GP) training procedures can be interpreted as instantiations of P2L, thus inheriting tight, self-certified bounds. Three contributions underpin these conclusions. First, we introduce early stopping in P2L, equipping it with a tight generalisation bound to reduce training costs and address the non-consistent case. Second, we adapt P2L to the Bayesian setting and demonstrate its equivalence to posterior updating in a hierarchical model. Third, we show that greedy subset-of-data GPs are special P2L instantiations. Numerical evidence shows that the resulting P2L bounds we obtain compare favourably with the train-test and PAC-Bayes approaches on various real-world datasets.
Submission Number: 897
Loading