Abstract: Nonparametric estimation of information divergence functionals between two probability densities is an important problem in machine learning. Several estimators exist that guarantee the parametric rate of mean squared error (MSE) of O(1/N) under various assumptions on the smoothness and boundary of the underlying densities, with N being the number of samples. In particular, previous work on ensemble estimation theory derived ensemble estimators of divergence functionals that achieve the parametric rate without requiring knowledge of the densities' support set and are simple to implement. However, these and most other methods all assume some level of differentiability of the divergence functional. This excludes important divergence functionals such as the total variation distance and the Bayes error rate. Here, we show empirically that the ensemble estimation approach for smooth functionals can be applied to less smooth functionals and obtain good convergence rates, suggesting a gap in current theory.
Loading