Keywords: statistical similarity, algorithms, computational hardness
Abstract: We introduce and study the computational problem of determining statistical similarity between probability distributions.
For distributions $P$ and $Q$ over a finite sample space, their statistical similarity is defined as $S_{\mathrm{stat}}(P, Q) := \sum_{x} \min(P(x), Q(x))$.
Despite its fundamental nature as a measure of similarity between distributions, capturing essential concepts such as Bayes error in prediction and hypothesis testing, this computational problem has not been previously explored.
Recent work on computing statistical distance has established that, somewhat surprisingly, even for the simple class of product distributions, exactly computing statistical similarity is \#$\mathsf{P}$-hard.
This motivates the question of designing approximation algorithms for statistical similarity.
Our first contribution is a Fully Polynomial-Time deterministic Approximation Scheme (FPTAS) for estimating statistical similarity between two product distributions.
Furthermore, we also establish a complementary hardness result.
In particular, we show that it is $\mathsf{NP}$-hard to estimate statistical similarity when $P$ and $Q$ are Bayes net distributions of in-degree $2$.
Primary Area: learning theory
Submission Number: 13537
Loading