Hierarchical Representation Learning of Dog Behavior via Single-View 3D Pose Estimation

Published: 02 Oct 2025, Last Modified: 14 Nov 2025NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: dog behavior, hierarchical representation, single-view 3d pose
Abstract: Dogs exhibit diverse behaviors that function as important signals in human–dog communication. Automatic analysis of such behaviors is increasingly needed in both scientific and applied contexts. However, conventional methods for behavior analysis face two major challenges: (i) 3D pose estimation typically requires multi-camera setups or prior training with complex calibration, and (ii) behavior classification relies heavily on predefined labels, limiting the ability to detect previously unseen behaviors. To address these limitations, we combine D-Pose, a model that estimates 3D dog poses from a single camera by learning pose representations, with h/BehaveMAE, a self-supervised framework that learns hierarchical behavior representations from pose sequences without predefined labels. Using a dataset of annotated dog behaviors, we perform preliminary evaluation by applying linear probing on the learned embeddings. Our results suggest that this approach provides a flexible and generalizable pipeline for behavior analysis, enabling promising representation learning from videos. While this study focuses on dog behavior, the proposed framework may serve as a step toward uncovering the mechanisms of animal communication in the future.
Submission Number: 27
Loading