Abstract: Normalising flows are a flexible class of generative models that provide exact likelihoods, and are often trained through maximum likelihood estimation. Recent work suggests that discrete-step flow models trained in this way can assign undesirably high likelihood to out-of- distribution image data, bringing their reliability for applications where likelihoods are important (e.g. outlier detection) into question. We show that continuous-time normalising flows trained with the conditional flow matching objective (CFM models) also provide unreliable likelihoods, and then investigate whether CFM models trained on various feature representations can lead to more reliable likelihoods. We consider (1) the original data; (2) features from a pretrained classifier; (3) features from a pretrained perceptual autoencoder; and (4) features from an autoencoder trained with a simple pixel-based reconstruction loss. Our proposed pixel autoencoder representations lead to reliable likelihoods from CFM models on out-of-distribution data but can yield samples of lower quality, suggesting opportunities for future work.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Pavel_Izmailov1
Submission Number: 3309
Loading