Keywords: VAE, representation learning, particle physics
TL;DR: The paper describes a VAE for particle physics using the EMD between jets as a reconstruction error, and some novel probes of the scale-dependent structure of the learnt representations..
Abstract: I present a Variational Autoencoder (VAE) trained on collider physics data (specifically boosted $W$ jets), with reconstruction error given by an approximation to the Earth Movers Distance (EMD) between input and output jets. This VAE learns a concrete representation of the data manifold, with semantically meaningful and interpretable latent space directions which are hierarchically organized in terms of their relation to physical EMD scales in the underlying physical generative process. The variation of the latent space structure with a resolution hyperparameter provides insight into scale dependent structure of the dataset and its information complexity. I introduce two measures of the dimensionality of the learnt representation that are calculated from this scaling.