Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose ReconstructionDownload PDF

Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: Existing state-of-the-art estimation systems are able to accurately detect the 2d poses of multiple people in images. In contrast, 3d pose estimation from a single image is ill-posed due to occlusion and depth ambiguities. Moreover, the 3d estimation problem remains partly unsolved, in part due to difficulties in acquiring appropriately labeled training data. However, given access to multiple cameras, or given an active observer able to capture the scene from multiple viewpoints, reconstructing 3d pose from 2d measurements becomes well-posed within the framework of standard multi-view geometry. Less clear is what is a good set of viewpoints for obtaining accurate 3d reconstructions, particularly in complex scenes where people are occluded by others or by scene objects. To tackle the view selection problem, in this paper we introduce ACTOR, an active triangulation agent for 3d human pose reconstruction. ACTOR consists of a 2d pose estimation network (any of which works) and a deep reinforcement learning-based policy for camera location selection, and the system is fully trainable. The policy predicts camera locations, the number of which varies adaptively depending on scene content, and the associated images are fed to the underlying 2d pose estimator. Crucially, training the policy requires no annotations -- given a pre-trained 2d pose estimator, ACTOR can be trained in a self-supervised manner. In extensive evaluations on complex multi-people scenes filmed using the Panoptic camera framework, we compare our active triangulation agent to strong multi-view baselines, and show that ACTOR produces significantly more accurate 3d pose reconstructions.
CMT Num: 2164
Code Link:
0 Replies