Inference-Based Deep Reinforcement Learning for Physics-Based Character Control

Published: 01 Jan 2022, Last Modified: 22 May 2025HPCC/DSS/SmartCity/DependSys 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Character motion synthesis and control has shown great significance in the field of character animation. Synthesizing more human-like behaviors and motion trajectories is generally acknowledged to be a significant challenge. Although studies have been proposed to cope with this challenge, we find that they fail to perform well on generating natural trajectories when the agent is required to change current motion orientation to another to reach a target point. Meanwhile, due to the intrinsic properties of the training process of deep reinforcement learning (DRL), such as bad reproducibility, low stability, and high costs on resources and training time, the performances are generally disappointing. To this end, in this paper we introduce a new concept, Redirection Factor, to revise the turning angle and guide the agent to move in a proper orientation to the target point. Different from previous works where the agent followed a policy trained by DRL reward function implicitly to reach the target point, our method decouples the calculation procedure during turning from the whole DRL network and define it as inference process, to explicitly calculate the concrete trajectories with a mathematical approach. Intuitively, such way of producing trajectories is more in line with the behaviors of human beings, so that motion behaviors can be generated with higher fidelity and naturalness. Furthermore, based on the DRL reward function in previous works, we take the turning angle into consideration, thus leading to more effective task-related rewards. Besides, in training process, we establish a new way to represent the relationship between the rotation of joints and their displacement. Based on such representation framework, we just keep essential skeletal features alive to avoid redundancy, which significantly relieves the complexity of the whole networks. Experiments show that our method outperforms previous works on various usual and unusual motion control tasks, and at the same time the costs have significantly declined on resources of hardware and training time consuming.
Loading