Abstract: Human pose estimation in computer vision predominantly focuses on the visible modality, with limited research on the infrared modality. No existing methods demonstrate robust performance across both modalities, missing their complementary strengths. This gap arises from the lack of a multimodal benchmark and the difficulty of developing robust multimodal capabilities. To address this, we introduce MMPD, a novel visible-infrared multimodal pose benchmark with high-quality annotations for both modalities. Leveraging MMPD, we expose the limitations of state-of-the-art methods due to modality variance. To overcome this challenge, we propose a novel method-agnostic scheme called AMMPE. By employing the Modality Adversarial Enhancement Stage and Modality Interaction Stage, AMMPE easily incorporates multimodal information without additional pose annotations and enhances effective modality interaction. Extensive experiments demonstrate that AMMPE improves performance in both visible and infrared modalities, achieving excellent modality robustness. The code and benchmark is avaible at: https://github.com/ICANDOALLTHINGSSS/Adversarial-Multi-Modality-Pose-Estimation
External IDs:dblp:conf/icassp/XiaWGLCLZ25
Loading