FOREcasting human activities via latent SCENE graphs diffusion

Published: 01 Sept 2025, Last Modified: 15 Sept 2025HRSIC 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scene graphs; Human action understanding; diffusion models; anticipation;
Abstract: Forecasting human-object interactions in daily activities is challenging because of the high variability of human behavior. Although training models to solve this task from plain videos is feasible, directly operating on raw frames is often limited by visual noise and confounding factors unrelated to the task. Scene graphs offer a promising alternative by providing structured representations of actor and objects actively participating in the action, and their relationships, potentially evolving over time. However, existing approaches to Scene Graph Anticipation (SGA) often rely on unrealistic assumptions, such as fixed objects over time, which limit their applicability to dynamic, real-world scenarios. In this paper, we propose FORESCENE, a novel framework for SGA that jointly predicts the temporal evolution of both objects and their interactions, based on a graph auto-encoder and a conditional latent diffusion model. We evaluated FORESCENE on the Action Genome dataset, showing that providing full graph prediction improves the model capabilities in human activity forecasting and outperforms prior SGA methods.
Submission Number: 4
Loading