MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation

Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin

Published: 2025, Last Modified: 12 Nov 2025IET Comput. Vis. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human pose estimation based on monocular video has always been the focus of research in the human computer interaction community, which suffers mainly from depth ambiguity and self-occlusion challenges. While the recently proposed learning-based approaches have demonstrated promising performance, they do not fully explore the complementarity of features. In this paper, the authors propose a novel multi-feature and multi-level fusion network (MMF-Net), which extracts and combines joint features, bone features and trajectory features at multiple levels to estimate 3D human pose. In MMF-Net, firstly, the bone length estimation module and the trajectory multi-level fusion module are used to extract the geometric size information of the human body and multi-level trajectory information of human motion, respectively. Then, the fusion attention-based combination (FABC) module is used to extract multi-level topological structure information of the human body, and effectively fuse topological structure information, geometric size information and trajectory information. Extensive experiments show that MMF-Net achieves competitive results on Human3.6M, HumanEva-I and MPI-INF-3DHP datasets.

External IDs:dblp:journals/iet-cvi/LiKLY25