Abstract: Deep learning models for action recognition remain challenged by scene biases as they prioritize easily learnable scene representations over the actual actor’s motion patterns. To address this limitation, we introduce InteractionCutMix, a novel scene debiasing method designed to preserve and leverage actor-object interactions in video augmentation. Unlike existing strategies that detach actors from their original scenes and neglect important contextual relationships with objects, InteractionCutMix captures and integrates these interactions into the augmentation process to enhance action representations. By maintaining actor-object interactions, our method provides a richer contextual understanding of actions and introduces better capabilities to distinguish visually similar motions across different action categories. Experiments on the UCF101, HMDB51, and Kinetics-100 datasets demonstrate the superiority of our approach compared to existing methods. The results validate our hypothesis about the fundamental importance of preserving actor-object interactions in video augmentation for robust and context-aware action recognition. Code is publicly available at https://github.com/rendicahya/intercutmix
External IDs:dblp:journals/access/WihandikaMA25
Loading