Abstract: Highlights•We use CLIP to address fine-grained interaction ambiguity in V-HOI recognition.•We enrich V-HOI’s intermediate features using CLIP’s prior knowledge.•Our results improve over state-of-the-art techniques in three HOI video datasets.
Loading