Abstract: Humans engage in a multitude of actions, some of which are rare but essential for data collection. Synthetic generation techniques are particularly effective in these scenarios, enriching the data for such uncommon actions. To address this need, we introduce a novel framework developed within Unreal Engine 5, designed to generate human action video data in hyper-realistic virtual environments. Our framework mitigates the scarcity and limited diversity of existing datasets for infrequent actions or routine tasks by utilizing synthetic motion generation through text-guided generative motion models, Gaussian splatting 3D reconstruction, and MetaHuman avatars. The utility of the framework is demonstrated by producing a synthetic video dataset depicting various human actions in diverse settings. To validate the effectiveness of the generated data, we trained VideoMAE, a state-of-the-art action recognition model, on the extended UCF101 dataset, incorporating both synthetic and real fall data, obtaining F1-scores of 0.95 and 0.97 when evaluated on the URFall and MCF datasets, respectively. The quality of the RGB-D videos generated represent a significant advance in the field. Additionally, a graph is generated from the rendered scene, detecting objects and their relationships, thus adding valuable contextual information to the video data. This capability to generate data across a wide range of actions and environments positions our framework as a valuable tool for broader applications, including digital twin creation and dataset augmentation.
External IDs:dblp:journals/vr/MuleroPerezBRV25
Loading