Learning Energy-Based Models for 3D Human Pose Estimation

Published: 2024, Last Modified: 21 Jan 2026IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, 3D human pose estimation has attracted more attention due to its promising applications. In general, existing methods usually directly predict a target 3D pose for a given input using a Deep Neural Network (DNN), and train the DNN by minimizing the mean squared error (MSE) loss. Despite the impressive performance of these methods, they create a fixed-variance Gaussian model of the conditional target density (the distribution for the target 3D pose given the input) from a probabilistic perspective, which significantly restricts the expressive capabilities of the learned conditional target density. Thus, this hinders the complete utilization of the predictive potential embedded within the DNN. We tackle this problem by delving into the latest developments in conditional energy-based models (EBMs) for probabilistic regression. In this work, we design a simple yet effective network to learn an energy function from 2D and 3D joints pairs. Then a gradient-based refinement procedure is adopted to minimize the energy function to find the corresponding target 3D pose. In this way, we can apply the energy-based model to refine the initial 3D joints estimated by the state-of-the-art 3D human pose estimator. Extensive experiments are conducted on two popular benchmarks on human pose estimation and the results demonstrate the superiority of our method over existing state-of-the-art approaches.
Loading