Abstract: Persistence diagrams (PDs) are the most com-
mon descriptors used to encode the topology of
structured data appearing in challenging learn-
ing tasks; think e.g. of graphs, time series or
point clouds sampled close to a manifold. Given
random objects and the corresponding distribu-
tion of PDs, one may want to build a statisti-
cal summary—such as a mean—of these random
PDs, which is however not a trivial task as the
natural geometry of the space of PDs is not lin-
ear. In this article, we study two such summaries,
the Expected Persistence Diagram (EPD), and its
quantization. The EPD is a measure supported
on R2, which may be approximated by its em-
pirical counterpart. We prove that this estimator
is optimal from a minimax standpoint on a large
class of models with a parametric rate of conver-
gence. The empirical EPD is simple and efficient
to compute, but possibly has a very large sup-
port, hindering its use in practice. To overcome
this issue, we propose an algorithm to compute
a quantization of the empirical EPD, a measure
with small support which is shown to approxi-
mate with near-optimal rates a quantization of the
theoretical EPD.
0 Replies
Loading