Abstract: Spatio-temporal Gaussian processes (GPs) are important probabilistic tools for inference and learning in climate science, epidemiology, or any time-driven general GP modelling problem. The current gold-standard methods for scaling GPs to large data sets are various flavours of pseudo-point methods. These methods do not cope well with long or unbounded temporal observation horizons, which undermines their efficiency and effectively turns the computational scaling back to cubic in the number of temporal observations. On the other hand, if the temporal part in the GP prior admits a Markov form, the inference can be sped up to linear in the number of temporal observations by using state space models. In this work we show how to combine the most widely used pseudo-point method, Titsias' variational approach, with the state space approximation framework. Our approach hinges on a surprising conditional independence property which applies to space--time separable GPs. By utilising pseudo-point approximations over space, and state space approximations through time, we are able to construct an approximation that is more scalable and widely applicable to spatio-temporal problems than either method on their own.