Evaluation of deep pose detectors for automatic analysis of film style.
Abstract: Identifying human characters – how they are portrayed, the actions they carry out, and their interactions in the scene – is
central to understanding plot and assessing artistic value in films, and are inherently linked to how we perceive and interpret
visual media. Building computational models that have sensibility towards story will thus require a formal representation of the
character. Yet this kind of data is complex and tedious to annotate at a frame level and at a large scale. Human pose estimation
(HPE) may be the key to facilitating this task, to identify features such as position, size, and movement that can be transformed
into input to machine learning models, and afford higher artistic and storytelling interpretation. However, current HPE methods
operate mainly on non-professional image content, with no comprehensive evaluation of their performance on artistic film.
In this work, we first propose a formal representation of the character based on cinematography theory. We design tools to
annotate a sampled subset of 2500 images from three datasets with this representation, one of which we introduce to the
community. An in-depth analysis is then conducted to measure the general performance of two recent HPE methods on metrics
of precision and recall, and to examine the impact of cinematographic style. From these findings, we highlight the advantages
of HPE for automated film analysis, and propose future directions to improve their performance on artistic film content.
0 Replies
Loading