Abstract: The use of task-agnostic, pre-trained models for knowledge transfer has become more prevalent due to the availability of extensive open-source vision-language models (VLMs), and increased computational power. However, despite their widespread application across various domains, their potential for online action detection has not been fully explored. Current approaches rely on pre-extracted features from convolutional neural networks. In this paper we explore the potential of using VLMs for online action detection, emphasizing their effectiveness and capabilities for zero-shot and few-shot learning scenarios. Our research highlights the potential of VLMs in this field through empirical demonstrations of their robust performance, positioning them as a powerful tool for further enhancing the state of the art in online action detection.
Loading