Adaptive Skeleton Prompt Tuning for Cross-Dataset 3D Human Pose Estimation

Haolun Li, Fuchen Zheng, Ye Liu, Jian Xiong, Wenhua Zhang, Haidong Hu, Hao Gao

Published: 01 Jan 2025, Last Modified: 17 Oct 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Inconsistency of distributions in human actions and camera viewpoints can lead to significant deviations when the pre-trained 3D pose estimators are tested on cross-datasets. In practical applications, the estimators usually follow the standard full fine-tuning paradigm on the target dataset, which requires updating and saving a complete set of training parameters for different tasks, resulting in a large waste of resources and distorting pre-trained features. Taking inspiration from the widely used prompt learning in NLP, we explore the parameter-efficient fine-tuning solution of 3D pose estimators for the first time and propose the Adaptive Skeleton Prompt Tuning (ASP-Tuning) method, which freezes the backbone of the pre-trained model and generates a series of pose generic promptings as well as adaptive promptings specific to the input skeleton features to learn distribution transformation. Extensive experiments on multiple estimator backbones and datasets show that our method is superior to other fine-tuning methods and achieves state-of-the-art performance.