Keywords: Diffusion models, score-based generative models, human pose prediction
Abstract: 3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging spatio-temporal task. It can be more challenging in real-world applications where occlusions will inevitably happen, and estimated 3D coordinates of joints would contain some noise. We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise, and propose a conditional diffusion model that denoises them and forecasts plausible poses. Instead of naively predicting all future frames at once, our model consists of two cascaded sub-models, each specialized for modeling short and long horizon distributions. We also propose a repairing step to improve the performance of any 3D pose forecasting model in the wild, by leveraging our diffusion model to repair the inputs. We investigate our findings on several datasets, and obtain significant improvements over the state of the art.
Student Paper: Yes