VHOIP: Video-based Human-Object Interaction recognition with CLIP Prior knowledge

Doyeol Baek, Junsuk Choe

Published: 2025, Last Modified: 15 May 2025Pattern Recognit. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We use CLIP to address fine-grained interaction ambiguity in V-HOI recognition.•We enrich V-HOI’s intermediate features using CLIP’s prior knowledge.•Our results improve over state-of-the-art techniques in three HOI video datasets.