DeformMLP: Dynamic Large-Scale Receptive Field MLP Networks for Human Motion Prediction

Haitao Huang, Chi-Man Pun, Haolun Li, Mengqi Liu, Jian Xiong, Hao Gao

Published: 01 Jan 2024, Last Modified: 06 Aug 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Predicting human motion requires addressing dependencies and errors for pose forecasting from sequences. The transformer’s self-attention aids this, but its complexity poses computational challenges. We present an efficient DeformMLP network without self-attention, using fully connected layers. DeformMLP includes DeformFCs, DeformFCt, and DeformFCst layers for spatial temporal modeling and calibration. DeformFCs capture semantics, DeformFCt learns relationships by summarizing time tokens, and DeformFCst assigns significance to dimensions to reduce computation. Our method balances efficiency and accuracy through decomposition and weight allocation. Evaluation on Human3.6M, 3DPW, CMU-MoCap datasets shows state-of-the-art prediction performance by benchmarks. The code is publicly available at https://github.com/HHT-98/DeformMLP.