Scaling Gaussian Process Regression with Full Derivative Observations

Published: 31 Jan 2026, Last Modified: 31 Jan 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present a scalable Gaussian Process (GP) method called DSoftKI that can fit and predict full derivative observations. It extends SoftKI, a method that approximates a kernel via softmax interpolation, to the setting with derivatives. DSoftKI enhances SoftKI's interpolation scheme by replacing its global temperature vector with local temperature vectors associated with each interpolation point. This modification allows the model to encode local directional sensitivity, enabling the construction of a scalable approximate kernel, including its first and second-order derivatives, through interpolation. Moreover, the interpolation scheme eliminates the need for kernel derivatives, facilitating extensions such as Deep Kernel Learning (DKL). We evaluate DSoftKI on synthetic benchmarks, a toy n-body physics simulation, standard regression datasets with synthetic gradients, and high-dimensional molecular force field prediction (100-1000 dimensions). Our results demonstrate that DSoftKI is accurate and scales to larger datasets with full derivative observations than previously possible.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Added timing experiments to main text (Table 4 and Table 6) with discussion. 2. Added experiments with additional datasets (i.e., n-body and UCI with synthetic gradients) to Appendix C as well as discussion of noisy gradients in results. 3. Added demonstration of Deep Kernel Learning with DSoftKI to Appendix D. 4. Updated notation involving temperature vector to Hadamard division to more clearly illustrate the component-wise use of temperature (Equations 10, 20, 21, and 22), and clarified surrounding text. Extended discussion of temperature parameter intuition in Appendix B. 5. Fixed heteroskedastic noise to separate value and gradient noise. 6. Added pointer to Appendix A.1 in Table 1 to highlight the differences in kernel approximation between DSVGP/DDSVGP and DSoftKI. 7. Fixed missing - in Equation 21. 8. Updated caption of Algorithm 1 to emphasize that temperature vectors introduce $d \times m$ extra learnable parameters. 9. Added pointer to Appendix B.1 on hyperparameter initialization. 10. Minor typographical adjustments for consistency (e.g., make sure all datasets are \texttt). 11. Updated abstract and introduction to reflect changes made. For instance, we mention additional datasets, deep kernel learning demonstration, and more clearly indicate differences between the SoftKI and DSoftKI temperature schemes.
Code: https://github.com/base26labs/dsoftki_gp
Assigned Action Editor: ~Shinichi_Nakajima2
Submission Number: 5720
Loading