Abstract: It is well-known in the video understanding community that human action recognition models suffer from background bias, i.e., over-relying on scene cues in making their predictions. However, it is difficult to quantify this effect using existing evaluation frameworks. We introduce the Human-centric Analysis Toolkit (HAT), which enables evaluation of learned background bias without the need for new manual video annotation. It does so by automatically generating synthetically manipulated videos and leveraging the recent advances in image segmentation and video inpainting. Using HAT we perform an extensive analysis of 74 action recognition models trained on the Kinetics dataset. We confirm that all these models focus more on the scene background than on the human motion; further, we demonstrate that certain model design decisions (such as training with fewer frames per video or using dense as opposed to uniform temporal sampling) appear to worsen the background bias. We open-source HAT to enable the community to design more robust and generalizable human action recognition models.
Supplementary Material: zip
Dataset Url: https://github.com/princetonvisualai/HAT
License: As our code is derived from other existing codes, please refer to the individual license (Apache 2.0 and CC BY-NC 4.0) when using our toolkit. The synthetic generated datasets are derived from the Kinetics-400 and can be re-generated using our code (we include the random seeds used in the paper). Our dataset admittedly suffers from the same licensing issues as Kinetics-400; some of the original YouTube videos included in Kinetics-400 do not specify any license. Our released dataset is a derived work of Kinetics-400 and follows the Creative Commons Attribution 4.0 International License for the annotation. For the synthetic video, please refer to the license of the individual videos which can be found at https://github.com/cvdfoundation/kinetics-dataset.
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes