Keywords: diffusion, curvature, manifold learning
TL;DR: We develop a diffusion-based notion of curvature for manifold data.
Abstract: High-throughput high-dimensional data is now being generated in massive quantities in many fields including biology, medicine, chemistry, finance, and physics. Researchers have successfully used manifold learning in order to gain insight from such data, particularly in biomedical and single-cell data. One such technique, data diffusion geometry, has been useful in understanding manifold intrinsic distances, density, and major non-linear axes or paths through the data. However, a relatively unstudied feature of high-dimensional data is curvature. While curvature is well-defined and easy to compute in low dimensions, it poses computational and conceptual difficulties in high dimensions. Here, we present two techniques to estimate curvature from high-dimensional data starting from data diffusion probabilities. The first technique, diffusion curvature, uses the spread or conversely laziness of a random walk to estimate curvature pointwise in data. The second technique, deep diffusion curvature, trains a neural network to estimate pointwise curvature. Since these techniques are scalable, we anticipate that they can be used to describe and compare datasets as well as find points in data that represent transitional entities.