Abstract: In this work, we explore generative models based on temporally coherent representations. For this, we incorporate Slow Feature Analysis (SFA) into the encoder of a typical autoencoder architecture. We show that the latent factors extracted by SFA, while allowing for meaningful reconstruction, also result in a well-structured, continuous and complete latent space – favorable properties for generative tasks. To complete the generative model for single samples, we demonstrate the construction of suitable prior distributions based on inherent characteristics of slow features. The efficacy of this method is illustrated on a variant of the Moving MNIST dataset with increased number of generation parameters. By the use of a forecasting model in latent space, we find that the learned representations are also suitable for the generation of image sequences.