Abstract: Normalising flows are a flexible class of generative models that provide exact likelihoods, and are often trained through maximum likelihood estimation. Recent work suggests that these models can assign undesirably high likelihood to out-of-distribution image data, questioning their reliability for applications where likelihoods are important (e.g. outlier detection). We show that continuous-time normalising flows trained with the conditional flow matching objective (CFM models) also provide unreliable likelihoods. Motivated by a hypothesis that unreliable likelihoods might be due to image-specific structures in the data, we investigate whether CFM models trained on various feature representations can lead to more reliable likelihoods. We evaluate CFM models trained on (1) the original data; (2) features from a pretrained classifier; (3) features from a pretrained perceptual autoencoder; and (4) features from an autoencoder trained with a simple pixel-based reconstruction loss. We show empirically that representations containing image-specific structure still lead to unreliable likelihoods from CFM models. Our proposed pixel autoencoder representations lead to reliable likelihoods from CFM models on out-of-distribution data, but can yield samples of lower quality, suggesting opportunities for future work.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Krzysztof_Jerzy_Geras1
Submission Number: 3309
Loading