Innovate Spatial-Temporal Attention Network (STAN) for Accurate 3D Mice Pose Estimation with a Single Monocular RGB Camera

Liyun Gong, Miao Yu, Gautam Siddharth Kashyap, Sheldon McCall, Mamatha Thota, Saeid Pourroostaei Ardakani

Published: 2024, Last Modified: 19 May 2025EUSIPCO 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Precise 3D pose estimation of mice holds crucial importance across various scientific domains. In this research, we introduce an innovative model named the Spatial-Temporal Attention Network (STAN), specifically designed for accurate 3D pose estimation of mice using a single monocular camera. The STAN model leverages a sequence of extracted 2D skeletons to predict the 3D pose of a mouse. Through the incorporation of spatial and temporal attention modules, our STAN methodology adeptly captures intricate spatial and temporal relationships among key points, thereby enabling a comprehensive representation of the dynamic movements inherent in a mouse's behavior for precise 3D pose estimation. To assess the effectiveness of our proposed method, extensive experimental evaluations were undertaken. The results show the superior performance of the STAN model when compared to other state-of-the-art approaches within the realm of 3D mouse pose estimation.