Unnormalized Density Estimation with Root Sobolev Norm Regularization

Published: 28 Oct 2023, Last Modified: 20 Nov 2023TRL @ NeurIPS 2023 PosterEveryoneRevisionsBibTeX
Keywords: Tabular data representation, Density estimation, Sobolev norm regularization, Score based methods, Fisher divergence for hyperparameter tuning, Anomaly detection, High dimensional data, Kernel Density Estimation (KDE)
TL;DR: New approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density
Abstract: Density estimation is one of the central problems in non-parametric statistical learning. While parametric neural network-based methods have achieved notable success in fields such as image and text, their non-parametric counterparts lag, particularly in higher dimensions. Non-parametric methods, known for their conceptual simplicity and explicit model bias, can offer enhanced interpretability and more effective regularization control in smaller data regimes or other data modalities. We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density. This method is statistically consistent, is different from Kernel Density Estimation, and makes the inductive bias of the model clear and interpretable. \textbf{Our method is assessed against the comprehensive ADBench suite for tabular Anomaly Detection, ranking second among over 15 algorithms}, all of which are specifically tailored for anomaly detection in tabular data. The contributions of this paper are as follows: 1. While there is no closed analytic form for the associated kernel, we show that one can approximate it using sampling. 2. The optimization problem needed to determine the density is non-convex, and standard gradient methods do not perform well. However, we show that with an appropriate initialization and using natural gradients, one can obtain well-performing solutions. 3. While the approach provides unnormalized densities, which prevents the use of log-likelihood for cross-validation, we show that one can instead adapt Fisher Divergence-based Score Matching methods for this task.
Submission Number: 14